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Abstract- Today, better numerical approximations are 
required for multi- dimensional SDEs to improve on the poor 
performance of the standard Monte Carlo integration. Usually 
in finance, it is the weak convergence property of numerical 
discretizations, which is most important, because with financial 
applications, one is mostly concerned with the accurate 
estimation of expected payoffs. However, recent studies for 
hedging, portfolio optimization, and the valuation of exotic 
options show that the strong convergence property plays a 
crucial role. 

When one prices an exotic option or wants to approximate a 
portfolio, the SDEs used are not important. What really 
matters is that the SDEs approximate correctly the real 
distribution of the process. Using this principle, this research 
suggests that, instead of considering a given no-commutative 
multi-dimensional SDE that represents our process, we 
consider another SDE that has the same distribution but with a 
different strong convergence order. Manipulating the new 
SDE, which has an extra process 0 it becomes commutative 
and we avoid the simulation of the Levy Area (extremely 
expensive with respect to the computational time). The new 
SDE obtains solutions that in a weak sense, which is in a 
distributional sense, coincide with those of the original SDE. If 
certain conditions are satisfied, & scheme gives a first order 
strong convergence without the simulation of the Levy Area. 
Conversely, for the original nocommutative SDE, the Milstein 
scheme, neglecting the Levy Area, has 0.5 order strong 
convergence. If the conditions are not satisfied, this study 
confirms experimentally that 0 scheme has a better strong 
approximation than using the standard Milstein scheme in the 
original SDEs (both schemes neglecting the simulation of the 
Levy Area). 

AMS subject classifications: 60G20, 65CXX, 65C20, 37H10, 
41A25. 

Keywords- Discrete time approximation, stochastic 
schemes, stochastic volatility models, Milstein Scheme, 
Levy Area, 0 scheme, Orthogonal Milstein Scheme, 
orthogonal transformation, strong convergence. 

I. Introduction 

S trong convergence properties of discretization of 
stochastic differential equations (SDEs) are very 
important in finance. Usually, it is the weak convergence 
property of numerical discretization, which is most 
important, because with financial applications, one is mostly 
concerned with the accurate estimation of expected payoffs. 
However, in recent studies for hedging, portfolio 



optimization, and the valuation of exotic options, the strong 
convergence property plays a crucial role. One example is 
the Multilevel Monte Carlo path simulation method (MSL- 
MC [13], [14]). Using strong convergence properties, the 
MSL-MC reduces substantially the computational cost for 
pricing exotic options using stochastic volatility models. For 
some exotic options, this research shows that the MSLMC is 
55 times more efficient than the standard Monte Carlo 
method using the Euler discretization. It reduces 90% of the 
computation time. As the MSLMC method, other research 
in the literature has shown that strong convergence 
properties are very useful for hedging and in portfolio 
optimization 

For time discrete approximations, the Euler-Maruyama 
scheme has 0.5 order strong convergence for all multi- 
dimensional SDEs. The next Taylor approximation, the 
Milstein scheme, gives first order strong convergence for all 
1 -Dimensional systems (using one Wiener process). 
However, for two or more Wiener processes, such as 
stochastic volatility models and correlated multidimensional 
SDEs, there is no exact solution for the iterated integrals of 
second order (Levy Area), and the Milstein scheme, 
neglecting the Levy Area, usually gives the same strong 
order of convergence as the Euler-Maruyama scheme. The 
numerical difficulty with the Milstein scheme is how to 
simulate efficiently the Levy Area. It is extremely expensive 
with respect to the computational time. 

On the other hand, in some problems, the diffusion 
coefficients have special properties, which allow the 
Milstein scheme to be simplified in a way that avoids the 
use of Levy Areas. As is well known, if the SDE is 
commutative (44), the Levy Areas need not be computed. 
Unfortunately, for many important practical financial 
problems (e.g. stochastic volatility models), the diffusion 
coefficients do not satisfy these conditions. The presented 
study confirms experimentally the fact that the inclusion of 
the Levy Area in a strong scheme cannot be avoided if one 
wants to achieve one strong order. Only strong order 0.5 
already achieved by the Euler scheme, results if one omits 
the Levy Area terms in the Milstein scheme. In addition, the 
difference in the the leading error between Euler and 
Milstein schemes are rather small. 

The purpose of the paper is to show that if certain conditions 
are satisfied, one can avoid the calculation of the Levy Area 
and obtain first convergence order by applying an 
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orthogonal transformation when the multi-dimensional SDE 
do not satisfy commutativity conditions (44). We introduce 
a new scheme or discrete time approximation based on an 
idea of Paul Malliavin where, for certain conditions, a better 
convergence order is obtained than the standard Milstein 
scheme without the expensive simulation of the Levy Area. 
We demonstrate when the conditions of the 2-Dimensional 
problem permit this and give an exact solution for the 
orthogonal transformation ( 0 scheme). 

The convergence analysis in this paper requires the SDE to 
satisfy global Lipschitz conditions in the drift and diffusion 
coefficients. This is a standard requirement for this type of 
analysis. However, most of the SDE models that are 
mentioned and used in the computational experiments do 
not satisfy such global Lipschitz conditions. Problems arise 
at the origin and/or at infinity. The results in this paper give 
numerical evidence that the conclusions regarding strong 
order remain true in circumstances where no theory 
currently exists. However, this is not within the scope of this 
research. 

To simplify the understanding of the orthogonal 
transformation and theta scheme, the first sections consider 
only the 2-Dimensional case. In section 2 we give the 
introduction to 0 scheme and how one can obtain it. In 
section 3, we include four examples using 6 scheme 
applied to stochastic volatility models that are important in 
financial applications. In section 4, we present the definition 
of 0 scheme for the 2-Dimensional general case. In section 
5, we present the definition of 0 scheme for the multi- 
dimensional general case. Finally, we present the 
conclusions, future work, references, and an appendix. 
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where subscript x and y denote partial derivatives, L{ i, 2 ) 
is the Levy Area defined by: 

r -| t+At pt+At r s pt-\-At pS 

=J j dW ltU dW 7t s - J J dW 2tU dW lt s t (4) 



and [Ai , A 2 ] is the Lie bracket defined by (ft*, is the Jacobian 
matrix of Ai): 



[Ai,A 2 ] = {^a 2 A\ - dA x A 2 ) = 



~P&y 

P°£x 



(5) 



As is well-known, the Levy Areas need not be computed if 
the SDE is commutative (44). To have this commutativity 
condition, we need that (1) satisfies: 



da 7 

— — — 0 and — — — 0 
dy dx 



(6) 



II. Orthogonal transformation 2d 



To simplify the understanding of the orthogonal 
transformation, we begin with the 2-Dimensional stochastic 
case: 



dx = y( x \x,y,t) dt + a(x,y,t) d\V\j , (1) 

dy = f^ v \x,y,t)dt-\-^(x,y i t)dW 2 1 t » E [dWi j t i dW 2 ,t\ = pdt . 

where dW i t are two Wiener processes and the coefficient 
functions & and are assumed to satisfy the linear 
growth and global Lipschitz conditions ([4], pp. 548) for 
existence and uniqueness of a strong solution to the SDE 
(1). Alternatively, (1) can be represented in vector form as: 

2 

dZ(tj = Ao (t, Z) dt T Afc (t, Z ) oTT ^ t , Z E . 

k = 1 



This is, in fact, only a symbolic representation for the 
stochastic integral equation 



ft z 

Z(t) = Z(io) T / Aq ( hS, Z ) ds + N ^ 

1 k = 1 



Afc(^j Z) d\\ 



The first integral is a deterministic Riemann integral and the 
second is a stochastic integral. Using the standard definition 
of constant correlation, one can represent the system (1) in 
vector form with independent noise as: 



If the conditions (6) are satisfied, the coefficients of the 
Levy Area (5) are equal to zero and we do not need to 
simulate (4). Unfortunately, only special cases satisfy the 
conditions. If we want to use stochastic volatility models, 
the SDE (1) will never be commutative. 

The numerical difficulty with the Milstein scheme is how to 
simulate efficiently the £( 1 , 2 ) Levy Area 
computationally very expensive). The technique of Gaines 
and Lyons [3] can be used to sample the distribution for the 
Levy Area conditional on AW^t, AlT 2 ,t 
However there is no generalization of this to higher 
dimensions apart from the approximation of [16], which has 
a significant computational cost (hours for a good 
approximation or small error). 

On the other hand, if one makes an orthogonal 

transformation of the uncorrelated process (2), one does not 
change the distribution (see Theorem 2 (56)) and gets: 

dx = ( x , y , t ) dt + a(x , y, £) dW i ^ , (7) 

dy = (x, y, t) dt + £(x, y, t)dW 2 t , 
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and the angle 0 is a function of x and y. If one computes the 
coefficients of the Levy Area (Lie bracket) for the new 
orthogonal process using independent Brownian paths 



[Ai,A 2 ] = 



—pfry — a 2 0 x — per fry 
P°fr ~ P°frx ~ i 2 0 y 



■0,2) 



to 



To avoid having to simulate the Levy Area Z, 
make (7) commutative, one needs the Lie brackets to be 
identically zero, e.g. impose the following conditions: 

-p£a y - o 1 e x - pafr y = 0 , 

+/3a£ x - pafr x -€ 2 0 y = 0 . 

Simplifying one gets: 
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If one wants to find a solution for 0 (ar.y), one must first 
determine when the system is consistent, or integrable. This 
requires that: 

d 2 d 



<94> 



dv 



( 10 ) 



dy dxdy dx ’ 

and the solution for 0 is: 

r(x,y) 

0(x,y) = ($dx + tydy) . 

However, if one applies ltd's lemma, one also obtains the 
following SDE for 0(x,y ): 



(ii) 



de = p (9> dt + a<i>dw ht + £9dW itt , 
dx 



dxdy 2 dy 



If one chooses to define 0 in this way, our system becomes 
a 3-Dimensional ltd process with two Wiener process 
inputs {o scheme): 



dx 
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If one computes again the Lie brackets with independent 
noise, one obtains (see Appendix (55)): 



jl 5 A 2 ] — 



0 
0 
d 
dx 



pai -5 



dy 



(14) 



Note that when condition (10) is satisfied, this Lie bracket 
(14) is identically zero. Because the value of Lie brackets 
(14) does not depend on the drift for 0 it is convenient to set 
it to zero: 

= 0 . 



In the remainder of the paper, we shall investigate when 
particular applications satisfy condition (10), in which case 
one can discretise either (7) or (13) and when they do not, 
and in which case one can only discretise (13) or the original 
untransformed SDE (1). Our objective is to try to achieve 
higher order strong convergence without the simulation of 
the Levy Areas. 

When the Lie bracket is not equal to zero, the important 
question to be considered is how precisely does 0 need to 
be calculated to obtain first strong order convergence in 
x and y? For example, does neglecting the Lie bracket 
affect the accuracy of 0 but not in x and y? 

One approach of 0 scheme results is given by Ana-Bela 
Cruzeiro, Paul Malliavin and T. Thalmaier in [2]. Because 
dw and dw have the same distribution (see Theorem 2 
(56)), they ignore the calculation of 0 For example, the 
1.0 strong order Milstein scheme for (7) with time step At. 
using (9) is (see Appendix (51)): 
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Replacing &W by Aik i n (15) 0 ne obtains the Malliavin 
scheme published in [2] and in book [11]. Note that the 
advantage of this scheme is that one does not need to 
simulate the Levy Area or be concerned about the value 
of 0 every time step. For weak solutions, the Malliavin 
scheme is a good approach. However, for strong solutions, it 
has the same or worse strong convergence constant than 
both the scheme that includes the simulation of 0 and the 
Milstein scheme that does not include the orthogonal 
transformation (3). For illustration, see the examples in the 
next section with simulation plots (Figures 1 to 4). 



III. Orthogonal stochastic volatility models 

In this section, we consider four mean reverting stochastic 
volatility models (SVM). The aim with a stochastic 
volatility model is to incorporate the empirical observation 
that volatility appears not to be constant and indeed varies, 
at least in part, randomly. The idea is to make the volatility 
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itself a stochastic process. The candidate models have 
generally been motivated by intuition, convenience and a 
desire for tractability. In particular, the SVMs presented in 
this section have all appeared in the literature and have the 
following generic form: 



dx = n^dt + ax^y^dWij , (16) 

dy = n {y) dt + px' ) *y X2 dW 2 ,t , E [dW liU dW 2i t\ = pdt . 



If one applies an orthogonal transformation, (16) changes to: 
dx = fi^dt + ctaf fl y Xl d\Vi j t , (17) 

dy = n {y) dt + p ^ 2 y A ^F 2)t , 



where dW i t are the orthogonal correlated Wiener processes 
defined in (8). If one would like to obtain an exact solution 
of 0 (l l ) , the integrability condition (10) becomes : 

££ = A c A^y* 0 " 1 _ 7 C 7 2 Q _ dV_ 

dy —pax' lf c+ l dx 

7c = 7i “ 7 2 “ 1 > A c = A 2 - X 1 - 1 , 

so then, for a, 3, A*, 7*. ^ o, one can conclude that 6 
is integrable if, and only if, A c = 7c = in which case the 
solution is: 



= I pAijg + 7 2 a ^ log ^_ f />72 a + A J 
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Using the 0 scheme (13), the 3-Dimensional it 6 process 
for (17) is: 
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If one computes the Lie brackets: 



pAi ft y Xc + 7 2 a x lc 
p3 y Ac+1 



0 




. Figure 1: Strong convergence test for x (Case 1). 

Even without the condition (18) being satisfied, one can 
perhaps improve the convergence using the 0 scheme of 
without the simulation of the Levy Areas. However, this 
depends on the parameters of our system. In other words, 
the accuracy is dependent on the value of the Lie bracket of 
the scheme (21). It give us the bias in the calculation of the 
value of 9 and hence in oo and y. Note that when 



condition (18) is satisfied, this Lie bracket (21) is identically 
zero. 

A. The Quadratic Volatility Model 

The first case we consider is the Quadratic Volatility Model: 
dx = xjidi T- xydW\ t , (22) 

dy = k(tt 2 -y)dt + y 2 dW 2i t - 

The Quadratic Volatility Model is a typical explosive model 
in financial applications where high or extreme volatility 
shocks persist through time. In this model, the volatility 
process itself is an OU process with a mean reversion 
level W?- The disadvantages of this model is that the 
volatility could easily become negative and no closed form 
solution is available for option pricing. 

Because \ c = 0, one can use either equation (17) together 
with (19), or the 3-Dimensional 9 scheme (20). 

Because of the orthogonal transformation, neither requires 
the calculation of the Levy Area. Figure 1 and Table 1 show 
that, as expected, the Euler scheme and the Milstein scheme 
with zero Levy Areas (setting T (1 2 )=0 in (3)) give 0.5 
strong convergence order. On the other hand, the Milstein 
scheme (3) with a proper value for the distribution of the 
Levy Area (by simulating the Levy Area using N 
subintervals within each time step) gives 1.0 order strong 
convergence, as do the two orthogonal 9 schemes. We 
have used the following parameters: t 0 = 0; T= 1: 
p= -0.50; 77= 0.1 : k=\A: =0.32: 

/?=i.22 and initial conditions x(t Q )=l y{t 0 ) = ™ 2- 



B. The 3/2 Model ( Case 2 ) 

The second case we consider is the following stochastic 
variance model, usually called the 3/2 Model [10]: 
dx = xjidt + XyfydWi t , (23) 

dy = k y (073/2 -y) dt + P 3/2 y 3 / 2 dW 2l t . 

The 3/2 Model is an important model in finance, not only 
because it has a closed form solution for option pricing as 
simple as the square root model (25), but it also displays a 
feature of many stochastic volatility models that one does 
not see in the square root model. That is, even after a change 
of measure to the riskadjusted process, option prices 
(relative to the bond price) under the 3/2 model are 
sometimes not martingales but merely local martingales 
[10]. When option prices are not martingales, this means 
that they are not given by the standard expected value 
formula (e.g. e~ rT E [max (S T - K)\ 

for a call option). So the 3/2 Model, with its closed form 
solution for European and Digital options, is one of the 
simplest illustrations of this important phenomenon for 
financial theory. It was first used by Cox, In ersoll, and Ross 
([1], 1985) and further investigated by Heston ([7], 1997) 
and Lewis ([10], 2000). 

Because A c = 0, we obtain almost the same results as Case 1 
(Figure 2 and Table 1). The parameters and initial 
conditions are the same as in Case 1 except for ^3/2 = ; 
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P = 2A4: y(t 0 ) = u^; which are chosen so that x and y will 
have approximately the same relative volatility. 

C. The Garch Diffusion Model ( Case 3 ) 

The third case we consider is the following stochastic 
variance model, usually called GARCH Diffusion Model: 

dx — xJLdt + Xy/yd\V\ t , ( 24 ) 

dy = k (vj\ —y)dt + f3 r yd\V 2l t • 

(24) is described as the diffusion limit of a GARCH-type 
process. The failure of the usual martingale pricing relation 
can also occur in this SVM and was first shown by Sin [15] 
in 1998. These failures are specific examples of the notion 
that the absence of arbitrage implies that financial claim 
prices are, in general, only strictly local martingales, not 
martingales [10]. From a practical point of view, the 
advantage of this model is that you can estimate its 
parameters using well-known algorithms that are available 
as computer software, although no closed form solution is 
available for option pricing. 

In this case, whereAo = 0.5 7 it is not possible to use the 
scheme 2D- 6 since the integrability condition is not 
satisfied. Figure 3 and Table 1 show that the only schemes 
that achieve first order convergence are the Milstein and 0 ; 
schemes, which simulate the Levy Area. However, the 
simulation results also show that there is a remarkable 
difference between the original and the orthogonal scheme 
without the simulation of the Levy Area, not the improved 
order of convergence achieved in the first case but a much 
improved constant of proportionality. The parameters and 
initial conditions are the same as in Case 2 except for 
This is chosen = td^: 0 = 0.78. 



to ensure that x and y will have approximately the same 
relative volatility as in the first two cases. 

D. The Square Root Model ( Case 4 ) 

The last case we consider for stochastic variance models is 
the Heston’s Square Root Model: 

dx = xfjLdt + x^/ydWity ( 25 ) 

dy = k (b7]/ 2 -y)dt+ 3 1/2v yrfH' 2 ,t . 

This model was proposed by Heston in 1993 [6]. The 
volatility is related to a square root process and can be 
interpreted as the radial distance from the origin of a multi- 
dimensional OU process. For small dt, this model keeps the 
volatility positive and is the most popular among all SVM 
because of its two main features: it has a semi-analytical 
pricing formula for European and Digital options which is 
easy to implement, and the solution is typical (it displays the 
same qualitative properties that one generally expects in 
time homogenous cases). Furthermore, it can be used to 
understand how volatility models that do not have analytical 
solutions behave in many respects. 

In this case, A c = 1 . Figure 4 and Table 1 show that neither of 
the Milstein schemes in which the Levy Areas are set to zero 
performs very well. Both have order 0.5 strong convergence, 
and the constant of proportionality is not much better than 
for the Euler scheme. When the Levy Areas are simulated 
correctly, the Milstein and schemes do exhibit the 

expected first order strong convergence. This demonstrates 
the importance of the Levy Areas in this case. 

The parameters and initial conditions are the same as in 
Case 2 except for cui /2 = ^ /3 = 0.25. 



Scheme 


Description 


| C-l 


C- 2 


C-3 


C-4 


Euler scheme 


set A t=dt, A Wi=dWi in ( 2 ) 


0.49 


0.50 


0.51 


0.50 


Milstein (£=0) 


Milstein (3), set 2 ) = ^ 


0.52 


0.54 


0.53 


0.53 


Milstein sch. 


Milstein (3), simulate L_ n 2 ) 


0.94 


0.95 


0.96 


0.96 


Malliavin sch. 


Milstein (17), set AU T j=rfU / j 


0.50 


0.52 


0.50 


0.49 


2 D-d scheme 


Milstein (17) with (19) 


0.96 


0.95 


n/a 1 


n/a 


3 D-G sch (£= 0 ) 


Milstein (20), set L ^ 2 )=0 


0.96 


0.95 


0.78 


0.63 


3 D-d scheme 


Milstein (20), simulate L ^ 21 


0.96 


0.95 


0.95 


0.94 



Table 1: Convergence orders 7 for SVMs (all cases (22-25)) 2 . 

*n/a = not applicable. 

2 Note that the constant proportion factors in the Figures 1 through 4 depend completely on the chosen parameter values in 
the examples and can be very different for another choice. To calculate the strong order of convergence we have used the 
theorems in [12] or [13]. 
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Strong convergence test for "X" (Case 2) 




Figure 2: Strong convergence test for x (Case 2). 



Strong convergence test for "X" (Case 3) 




Figure 3: Strong convergence test for x (Case 3). 

Strong convergence test for "X Hi (Case 4) 




Figure '1 Strong convergence teit tor i (Case 4). 
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IV. 2D Orthogonal milstein scheme 
( 6 Scheme) 

This section presents the definition of the 0 scheme for the 
2-Dimensional SDE case. At the end of the section, we 
present two more examples that confirm a better order of 
convergence than the standard Milstein scheme without the 
simulation of the Levy Area when one uses an orthogonal 
transformation. 

A. 2D- 9 Scheme 

Theorem 1: 2D ~Q Scheme (Exact solution) 

If one has a 2-Dimensional Ito stochastic differential 
equation with two independent Wiener processes: 



( 26 ) 



where hi % are smooth functions of tand 
satisfying the linear growth and global Lipschitz conditions 
([4], pp. 548). If one applies an orthogonal transformation to 
(26) described by: 



' X,t ' 




a l 


dt + 


61,1 


61,2 


' m, t ' 






02 




624 


62,2 


. m , t . 



& 

1 


■ 


1 

\i 

1 


. 



cos 6 — sin 6 

sin 6 cos 6 



dW\ t 

dW 2 t 



pX 2 



where: 

0 t(x u x 2 )= f 4 *LYi+ I ! W 2 

and satisfying: 



3^ 



d 4 > 



dX x dX 2 



(27) 



( 28 ) 



( 29 ) 



then the new orthogonal process has 1.0 strong order 
convergence using the Milstein scheme neglecting the 
simulation of the Levy Area. Conversely, for H~ 7= 0 
(the commutativity condition (44) is not satisfied), the 
Milstein scheme of (26) with zero Levy Area has 0.5 strong 
order convergence. The functions <L and \[/ are equal 
to: 

^ (&2,l + 2 ) - {&14&24 + &1,2&2,2) 

^ 2 J 

( ^1,1 ^2,2 — ^1,2^2. l) 

(&L + ^1,2) — ^1 (&1.1&2.1 + ^1,2 ^2,2) 

(^1,1 ^2,2 - &l,2^2,l) 






where H~ are the coefficients of the Levy Area (Lie 
bracket) of (26) and are defined by: 

d n 



Proof: The 1.0 strong order Milstein scheme for (26) with 
time step At is (Appendix 42): 



W,t+At 

W,t+At 



Xi,t 

x 2 ,t 



a 1 
a 2 



At 4- 



&i,i *>1,2 
*>2,1 *>2,2 



A Wi jt 
AW 2 j 



+ 



2 

Z 

3 = 1 






Ht 

Ht 



AWitAV/ 



*2,t 



r -I t+At 



f- ] • Hf = Ll b h2 ±L^ 

For Hf 7^ 0, the Milstein scheme is 1.0 strong order 
convergence when one includes all terms in the equation 
(see Theorem 10.3.5, page 350 from [9]); otherwise it 
becomes 0.5 strong order convergence. In general, if x T is 
the solution of the SDE (26) and is the numerical 

approximation using the Milstein scheme, for H~ ^ 0 
and neglecting the simulation of the Levy Area, one can say: 

, 0.5 



X T ~ X T 



< Ci (At) 



Xu 




ttl 


dt T 


614 h,2 


dWu 


Xm 




a 2 




624 62,2 


dW 24 



On the other hand, if one makes an orthogonal 
transformation (27) to (26), one obtains: 



(31) 



The 1.0 strong order Milstein scheme for (31) with time step 
A* is (Appendix (48)): 



+ 2 R m 



AW'i.tAWa.t 



X\j+At 






+ 


ai 


At + 


b 1 


^>1,2 


' aw m ‘ 


X 2, t+At 




_ * 2 ,t . 




a<1 




^2,1 


^2,2 


AH^.t 



= Z 

j = 1 



Ljbij 

Ljb 2l j 



Ht 

Ht 



where: 



Hi 

Hi 



(A W] t - At ) + 

f— 1 t+At ~ ~ 

[L {h2) \ t , Hj = L\bj 2 ± L 2 bj \ 



&14 ^1,2 




^1,1 ^1 ,2 




cos 8 — sin 8 


604 622 




&24 ^2,2 




sin 0 cos 8 



If one computes the coefficients of the Levy Area using 
independent Wiener processes, one gets: 



. ( 32 ) 



H'i - ^ (^,i + ty - M 1 + & Ul 

To avoid having to simulate the Levy Area, one needs (32) 
to be identically zero, e.g. impose the following conditions: 



' X 




e; _ 


- 






Simplifying one gets: 






= O 



< 1 > = 



d 6 H x ( 6 | 1 + ^2.2) — (^1, 1^2, 1 + ^1,2^2, 2) 

9Xi (61,162,2 — 61,262,1 ) 2 
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To find a solution for B one must first determine when the 
system is consistent (integrable); this requires condition (29) 
and the solution for Q is (28). 

B. 3D- 6 Scheme 



If one has a 2-Dimensional Ito process (26) and applies an 
orthogonal transformation (27) to it, where the rotation 
angle & t is described using a third SDE: 



■ *l,t “ 




ai 




x 2 ,t 


= 


a 2 


dt + 


. Ot . 




0 





6i,i 

^2.1 

(4^1,1 + 4^2, l) 



& 1,2 

& 2,2 

(4^1,2 + 4* b 2 p) 





dW i, t 




dW 2 ,t 



(33) 



then, for sufficiently smooth function the 

Milstein scheme for the 3-Dimensional SDE (33) can have 
better strong convergence than (26) using the Milstein 
scheme neglecting the simulation of the Levy Area. The 
accuracy of and hence in depends on the value 

of the Lie bracket (34) of the process (33): 




0 

0 






d'l' 

dX 1 



to \ 
dX 2 ) 



(34) 



The 1.0 strong order Milstein scheme for (33) with time 
step At is (Appendix (53)): 



X l,t+At 

X 2, t+At 




X 2 ,t 


+ 


ai 

G2 


A t + 


~ Ki 
&2,1 


&1,2 

&2,2 


' AI Vi t , ' 
. AU' 2 ( 


. @t+At 




. o t . 




0 






bo.2 . 



+ 2 R m 



2 


hh-i ' 




H? 


So 

II 

M 


Lj ^2 ,j 


(AW& - At) + 


H 1 


3= 1 


. L i b3 ’j . 




Hi 



nr 

HZ 



1 t+At 

h 1,2) 



Hf = L\bj$ ± L 2 bj i 



If one computes the coefficients of the Levy Area of the last 
equation (Appendix (55)), one obtains: 
r _ - T 




0 0 H~ 



(35) 



If the value of H 3 in the Lie bracket r t is small enough 
the accuracy of $fis not affected by neglecting this term 
in the equation and, hence, the 3D ltd process (33) will 
have better strong convergence than (26) using Milstein 
scheme neglecting the simulation of the Levy Area. Note 
that when condition (29) is satisfied, the Lie bracket (34 or 
35) is identically zero ( H ^ = 0). 



C. Example Of Q Scheme 



dx = xfi x dt + 0.5a7 yJydW\ t , (36) 

dy = y n y dt + 0 .hyfxy x dW^t , E [dW\ ,t, dW^t\ = pdt , 

where: 

B'x = = 0.05, p = -0.2, ^ (t Q ) = 1 7 y {t 0 ) = 0.3 2 . 

If 7 = A = 1.5, then we have the integrability condition 
29) or (18) and either Theorem 1 (2D — 6 scheme) 
or 3D - 0 scheme can be applied. Figures 5 and 6 show 
that the new orthogonal process of (36) has 1.0 strong order 
convergence in oo and y using the Milstein scheme 
neglecting the simulation of the Levy Area. Conversely, 
Euler, Malliavin and the Milstein schemes with zero Levy 
Area have 0.5 strong order convergence in x and y. 

If 7 = A = 1 ? then the integrability condition (29) or 
(18) is not satisfied and only the 3D- 0 scheme can be 
applied. Figure 7 shows that the only schemes that achieve 
first order convergence are the Milstein and 0 schemes 
which simulate the Levy Area. However, Figure 7 shows 
that there is a remarkable difference between the original 
and the orthogonal scheme without the simulation of the 
Levy Area. The improved order of convergence is not 
achieved as in the case of 7 = A = 1.5, but there is a 
much improved constant of proportionality.Note that the 
constant proportion factors in the Figures 5 through 7 
depend completely on the chosen parameter values in the 
examples and can be very different for another choice. 

The numerical results do not simply confirm an outcome 
that has been rigorously derived in Theorem 1 under global 
Lipschitz conditions, but are indicative 




Figure 5: Strong convergence test for 
Strong convergence test for "Y" 




-* — Euler scheme 
H — Milstein (1=0) 
-* — Milstein sch. 

Malliavin sch. 
■6— 2D 6 scheme 
3D 0 sch (L=0) 



Consider the following 2D SDEs: 



Figure 6: Strong convergence test for 
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Strong convergence test for "X" 




Figure 7: Strong convergence test for 
that, under less restrictive assumptions, such results are 
possible. Though there has been some work completed on 
convergence analysis under non-global Lipschitz conditions 
([17] and [8]), this topic is not covered in this research. 

V. Q Scheme (n-dimensional) 

In this section we shall present a summary when one deals 
with an N-Dimensional SDE and would like to apply an 
orthogonal transformation to avoid the calculation of the 
Levy Area. All models can be described through a SDE of 
the form: 

dX t = p (X h t) dt + a {X h t) dW t , X{t Q ) = X 0 , (37) 

where: 

x t = x(t)eR d ,w t el M 

E\dW^dW K t] = (l, fori^k, 

If one replaces the Wiener process w t with an orthogonal 
transform w t , the probability distribution does not change, 
and we obtain the set of all orthogonal transformations of 
our system (37): 

dX t = n (It, t)dt + a (x t , tj dW t , ( 38 ) 

where: 

dWt = r (6t)dW t and T (6 t ) = T (©<,* ftt)) € R M x M . 

Using non-orthogonal Wiener processes, (38) can be 
represented by: 

dX t - pi (x t , tj dt + a (x t , t, 6 t j dW t , (39) 

where: 

M 

<r(x t M)=v(bi,k(Xt,tA)) , Ki(XtA6t)=Y, b iA,i 

s 

The 1.0 strong order Milstein scheme for (39) with time 
step A t using Ito operators is [9] : 

M ~ i 

^i,t+ At ~ %i 7 t + ft A t + ^ bijAWjt + -Rm • 



Note that the coefficient functions cit 5 b t ^ are assumed 
to satisfy the linear growth and global Lipschitz conditions 
for existence and uniqueness of a strong solution to the SDE 
(37). If one uses the Levy Areas, R M is equal to: 

M M t+M 

Rm= Yj ^Aj'2 [kWj ht AWj 2 ^ju 2 k^ + Y (^U2))j [%j2)] t 

■ ( 4 °) 

is the Kronecker symbol {6 Jl j2 = 1 if jj = j 2 and 
zero otherwise) and the [to operators are defined in (30). 
Using the definition of the variables, the orthogonal 
properties and considering the vector fields to be 
independent of time, the coefficients for the Levy Area (40) 
are equal to: 




where are the orthogonal functions defined by: 



~ (-l ) fc+1 (©fc,fc+l^@fc,A: - ft ,fc+l) ■ 

To avoid having to simulate the Levy Areas — ? one 

needs to impose the following conditions: 

( 4 ^=°- 



VI. Conclusions 

Strong convergence properties of discretizations of 
stochastic differential equations (SDEs) are very important 
in financial applications. Numerical examples in the paper 
demonstrate, as expected, a 0.5 and 1.0 strong order of 
convergence for Euler and Milstein schemes respectively. 
To obtain a 1.0 strong order of convergence with the 
Milstein scheme, one has to apply the scheme to the vector 
form of the SDE, use independent Wiener processes and 
compute correctly the double integral or Levy Area. 

We have shown that, under certain conditions, the use of the 
orthogonal 6 scheme in multi-dimensional SDEs can 
achieve the first order strong convergence properties of the 
Milstein numerical discretization without the expensive 
simulation of Levy Areas and when the commutativity 
condition is not satisfied. Conversely, the Milstein scheme 
with zero Levy Area has a 0.5 strong order convergence. 

The bias or error in the computation of the rotation angle 0 
that makesthe Lie bracket equal to zero in the orthogonal 
scheme (6 scheme) is crucial to obtain a better convergence 
order. When the conditions for integrability are satisfied, 
one can use the formula for 0 to obtain the value of the rot- 
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ation angle and obtain first order strong convergence. 
Otherwise, one has to use the 3-Dimensional transformation 
and check the magnitude of the Lie brackets to decide if it is 
likely to give computational savings in the solution of our 
system.The numerical results in this research show a better 
strong order of convergence than the standard Milstein 
scheme (without the simulation of the Levy Area) when an 
orthogonal transformation is applied. 

Standard convergence theory for numerical methods for 
SDEs (e.g. as in Kloeden and Platen [9]) makes a global 
Lipschitz assumption on the coefficients. However, most of 
the SDE models that are mentioned and used in the 
computational experiments, do not satisfy such global 
Lipschitz conditions (e.g.example (36)). The numerical 
results are not simply confirming a theory that has been 
proved; they are giving numerical evidence that the 
conclusions about strong order remain true in circumstances 
where no theory currently exists. 

When one prices an exotic option or wants to approximate a 
portfolio, the SDEs used is not important. What really 
matters is that the SDEs approximate correctly the real 
distribution of the process. Because of this Q scheme 
helps to obtain a better strong order of convergence and can 
be applied for hedging, portfolio optimization and pricing 
exotic options. In [13] and [14], we have demonstrated that 
the use of strong convergence of 0 scheme reduces 
substantially the computation cost for pricing exotic options 
(90% of the computation time). 

In summary, this paper proposes a better numerical 
approximation for multidimensional SDE’s. We introduce a 
new scheme or discrete time approximation where a better 
convergence order is obtained than that of using the standard 
Milstein scheme without the simulation of the expensive 
Levy Area. We demonstrate when the conditions of the 
2-Dimensional problem permit this and give an exact 
solution for the orthogonal transformation. Our applications 
are focused on continuous time diffusion models for the 
volatility and variance with their discrete time 
approximations (ARV). 

For future work, we think that, for multi-dimensional SDE’s 
(d > 3) ? the investigation and test of the multi- 
dimensional □ scheme will be very interesting. For some 
parameters, it will be obvious that the new orthogonal 
scheme will provide considerable computational time 
savings when calculating the strong and weak solutions. 
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IX. Appendix 

The Appendix outlines a theorem and mathematical 
operations required to understand both the Milstein and 0 
schemes. 

A. Milstein Scheme (ltd Operators ) 

We start with the 2D Ito SDE case with a 2D independent 
Wiener process: 



(41) 



where a-j , bi & are assumed to satisfy the linear growth 
and global LipSchitz conditions ([4], pp. 548) for existence 
and uniqueness of a strong solution to the SDE (41). The 1.0 
strong order Milstein scheme for (41) with time step At 
using ltd operators is [9]: 



' Xijt ' 




a\ 


dt + 
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&i,2 
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. 1*2,1 
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( 42 ) 



where using Levy Areas, R m is equal to: 

2 



R M = X 

j=l 



Ljb\j 



HR 

h; 



(AH£-Ai) + 

_ -it+At 



h: 

Ht 



r_ I M-ai 



AW^AWh.t (43) 
Hj — L\ bj 2 i L 2 bj \ • 



The Ito operators are defined by: 

d 



— X • 
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^2,JCi =Ml,2 Xl + &2,2&1,2*, 


C 2,Xi = +kA,2 x , 


^3,11 = &l,l&l,2 Xl + Ml,2x, 


Q,X 2 =^1,1^2, 2 Xl +M2,2 x , 


Ci,xi =kAh, 


Q,X 3 =41,2^, +^,2^1x, 



Having example (2), we get: 



Ci ; Xi — "H 


^1^2 P 


(^2 ; Xi = 0 


C,x 2 = p 2 ti v 


II 


C 3i X 2 =poi x + pMy 




^■4^X2 — 



(45) 



Checking the commutativity conditions (44) in example (2), 
we have: 

pi 3x; = 0 ’ pp ax^ = 0 ' 



For <7 0 and £ ^ 0 7 we need: 

dt do 

S =0 and - — = 0 



dX 1 8X 2 

B. 2d ~ 0 Scheme (Orthogonal Milstein) 
If one makes an orthogonal transformation to (41), one gets: 

(46) 



x lit 




ftl 


dt -f 
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hg 




X 2 , t 








&2,1 
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dW 2, t 



where: 



A more general, but important special case is that of 
commutative noise in which the diffusion matrix of (41) 
satisfies the commutativity condition (pp. 348, [9]): 

If conditions (44) are satisfied, H~ = 0 in (43) and we 
do not need to simulate the Levy Areas. Doing some 
computations, (43) is equal to: 
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The 1.0 strong order Milstein scheme for (46) with time step At 
using Ito operators is [9]: 
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where using Levy Areas, R M is equal to: 
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Doing some computation, (49) is equal to: 

Rm = 

+ 



where: 

To make zero the coefficient of the Levy Area in (50), one 
needs: 

^ ^ 90 _ ^ (fr|i + 6| 2 ) ~ (614624 + 61,262,2) 

( 61462,2 ~ 61 , 262 , if 



Doing some computation, (53) is equal to: 



(AW 2 2 t -Af) ( 54 ) 
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Having example (7), we get: 
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C. 3D — 6 Scheme 

We start with the following 3-Dimensional Ito SDE with a 
2-Dimensional Wiener process: 
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where using Levy Areas, .Rm is equal to: 
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Doing some operations, the Lie bracket is equal to: 
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Having example (13), we get: 
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and a{. b{ % are assumed to satisfy the linear growth 
and global Lipschitz conditions. The 1.0 strong order 
Milstein scheme for (52) with time step At is [9]: 
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D. Orthogonal Transformation Theorem 

In this section, we shall present a theorem where we prove 
that if an orthogonal transformation is applied to a standard 
normal distributed process d\V. then the new orthogonal 
process dW is independent and has the same distribution as 
the original process d\V. 

Theorem 2: Distribution of an Orthogonal Standard Normal 
Distributed Process 

If d\V l tl dlV 2i t are two independent and identically 
standard normal distributed processes with expectation Q 
and variance a 2 . and we apply an orthogonal transformation 
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then: __ 

A. _ The new orthogonal random process, dW\ 
and dWo t are independent. 

B. d\V i t and d\ V ? t have the same distribution. 
Proof: 

A. If and t are independent then; 

E [dW l t d\\\ t ] = 0 . 

Doing the same for the orthogonal Wiener process: 
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B. The probability density function (PDF) of an 
N-Dimensional multivariate normal is [5]: 
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where M — [Mi vMs? --■? /%] is the mean and ^2 is the 
covariance matrix (positivedefinite real N x JV matrix) 
For dW t we have: 
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If dWt and dW t have the same distribution then they 
have the same mean and covariance matrix: 
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Input Data Processing Techniques in Intrusion 
Detection Systems - Short Review 
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Abstract — In this paper intrusion detection systems (IDSs) 
are classified according to the techniques applied to processing 
input data. This process is complex because IDSs are highly 
coupled in actual implemented systems. Eleven input data 
processing techniques associated with intrusion detection 
systems are identified. They are then grouped into more 
abstract categories. Some approaches are artificially intelligent 
such as neural networks, expert systems, and agents. Others 
are computationally based such as Bayesian networks, and 
fuzzy logic. Finally, some are based on biological concepts such 
as immune systems and genetics. Characteristics of and 
systems employing each technique are also mentioned. 

I. Introduction 

W hen traditionally classifying intrusion detection 
systems (IDSs) as misuse, anomaly or hybrid, the 
systems are grouped according to the technique they utilize 
to detect intrusions. For example, misuse-based IDSs match 
already stored attack signatures against the audit data 
gathered while the monitored system is or was running. In 
anomaly based IDSs, detection utilize models of normal 
behavior where any deviation from such behavior is 
identified as an intrusion. Another type of traditional 
classification is categorizing an IDS according to its setup as 
network-based, host-based or hybrid. Network based 
systems monitor network activities whereas a host based 
system monitor the activities of a single system for intrusion 
traces [1]. In general, IDSs may apply many techniques to 
detect intrusions and improve detection such as neural 
networks, expert systems, agents, Bayesian networks, fuzzy 
logic, immune systems and genetics. Little attention has 
been given to classifying the processing techniques applied 
on the input data provided to the IDS. In this paper we 
classify input data processing techniques utilized with IDSs 
that may use and may not use the same processing technique 
to detect intrusions. In section 2, abstract classification of 
the different input data processing techniques utilized with 
IDSs will be presented. 

Eleven input data processing techniques associated with 
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IDSs are identified. Then they are grouped into more 
abstract categories. In section 3, a general description as 
well as some advantages and disadvantages of each 
technique and examples of system employing these 
techniques will be presented. 

II. Classification of input data processing 

TECHNIQUES IN IDS s 

In this paper, we are concerned with the techniques used to 
process input data that is considered when designing and 
implementing IDSs. Classifying such techniques are not 
easy because in the actual implemented system, combination 
of techniques may be used. However, identifying them 
individually helps better understand the merits and 
limitations of each, and how to improve a techniques 
performance by using another. Eleven techniques are 
identified [shown at the lower level of diagram 1] that are 
widely and currently used for processing input data of IDSs. 
They are then grouped into more abstract categories that are 
identified at the upper levels of diagram 1 . This is important 
because the characteristics of each technique are highly 
affected by the category(ies) that it belongs to. In the lower 
level of Fig. 1, techniques such as Agents and Data Mining 
belong to the Intelligent Data Analysis category. This is 
indicated by the dotted relation between Data Analysis and 
AI categories. The techniques: Expert systems and Fuzzy 
logic are intelligent model-based-rule-based systems shown 
by the dotted relation between Rule based and AI categories 
in Fig. 1. Next is an explanation of each item in Fig. 1, along 
with some identified characteristics. 




Fig.l. Data processing techniques applied on input data 
processed by Intrusion Detection Systems 
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A. Rule Based 

If a rule-based IDS is to use input data or audit data, such 
information will be in a codified rules format of known 
intrusions. The input data will represent identified intrusive 
behavior and categorizing intrusion attempts by sequences 
of user activities that lead to compromised system states. 
The IDS will take as input the predefined rules as well as the 
current audit data and check if a rule is fired. In general, 
using rule bases are affected by system hardware or software 
changes and require updates by system experts as the system 
is enhanced or maintained. Such input data technique is very 
useful in an environment where physical protection of the 
computer system is not always possible (e.g., a battlefield 
situation) but require strong protection 
[http://www.sei.cmu.edu/str/descriptions/rbid.html]. 

In general, rule based systems can be: 

I) State-based: in the audit trails, intrusion attempts 

are defined as sequences of system states leading 
from an initial state to a final compromised state 
represented in a state transition diagram. The two 
inputs to the IDS will include the audit trail and the 
state transition diagrams of known penetrations that 
will be compared against each other using an 
analysis tool. One advantage of using state based 
representation of data is that it is independent of the 
audit trail record and is capable of detecting 
cooperative attacks and attacks that span across 
multiple user sessions. However, some attacks 
cannot be detected because they cannot be modeled 
with state transitions 

[http://www.sei.cmu.edu/str/descriptions/rbid.html] 

II) Model-based: intrusion attempts in input data can 

be modeled as sequences of user behavior. This 
approach allows the processing of more data, 
provide more intuitive explanations of intrusion 
attempts and predict intruder's next action. More 
general representation of penetrations can be 
generated since intrusions are modeled at a higher 
level of abstraction. However, if an attack pattern 
does not occur in the appropriate behavior model it 
cannot be detected 

[http://www. sei . emu. edu/str/descriptions/rbid.html] 



B. Artificial Intelligence 

AI improves algorithms by employing problem solving 
techniques used by human beings such as learning, training 
and reasoning. One of the challenges of using AI techniques 
is that it requires a large amount of audit data in order to 
compute the profile rule or pattern sets. From the audit 
trails, information about the system is extracted and patterns 
describing the system are generated. In general, AI can be 
employed in two ways: (1) Evolutionary methods 

(Biologically driven) are mechanisms inspired by biological 
evolution, such as reproduction, mutation and 
recombination. (2) Machine learning is concerned with the 
design and development of algorithms and techniques that 



allow the learning of computers. The major focus of 
machine learning research is to extract information from 
data automatically [2]. 

C. Data Analysis 

With data analysis, data is transformed in order to extract 
useful information and reach conclusions. It is usually used 
to approve or disapprove an existing model, or to extract 
parameters necessary to adapt a theoretical model to an 
experimental one. Intelligent data analysis indicates that the 
application is performing some analysis associated with user 
interaction and then provides some insights that are not 
obvious. One of the problems faced when applying such an 
approach is that most application logs (input information) do 
not conform to a specific standard. Analysis of logs should 
be performed to find commonalities and different types of 
logs should be grouped. Another problem is the existence of 
noise, missing values and inconsistent data in the actual log 
information. Attackers may take advantage of the fact that 
logs may not record all information and therefore exploit 
this point. Finally, real world data sets tend to be too large 
and multidimensional which requires data cleaning and data 
reduction [3]. 

D. Computational Methods 

Computational intelligence research aims to use learning, 
adaptive, or evolutionary algorithms to create programs. 
These algorithms allow the systems to operate in real time 
and detect system faults quickly. However, there are costs 
associated with creating audit trails and maintaining input 
user profiles as well as some risks. For example, because 
user profiles are updated periodically, it is possible to accept 
a new user behavior pattern where an attack can be safely 
mounted. This is why it is difficult sometimes to define user 
profiles especially if they have inconsistent work habits. In 
general, there are two types of IDSs that utilize a 
computational method: (1) Statistics -based IDS are 

employed to identify audit data that may potentially indicate 
intrusive behavior. These systems analyze input audit trail 
data by comparing them to normal behavior to find security 
violations. (2) Heuristics-based IDS which can be a function 
that estimates the cost of the cheapest path from one node to 
another [http://www.sei.cmu.edu/str/descriptions/sbid.html] . 

III. Capabilities and examples of processing 

TECHNIQUES OF INPUT DATA USED BY IDSS 

Because some IDS data processing techniques are closely 
interacting and similar, classifying them is complex. 
However, we believe that the identified eleven categories 
capture most of the well known types. For example, from 
Fig. 1, although expert systems and fuzzy logic belong to 
the categories AI and rule based they have distinguishing 
characteristics and usages. The output of the expert system 
is specific; the data that is used to build the system is 
complete, and the set of rules are well defined. As for fuzzy 
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logic, it is usually used in systems where the output is not 
well defined and is continuous between 0 andl. 

A. Bayesian Networks 

Bayesian networks are used when we want to describe the 
conditional probability of a set of possible causes for a given 
observed event that are computed from the probability of 
each cause and the conditional probability of the outcome of 
each cause. They are suitable for extracting complex 
patterns from sizable amounts of input information that can 
also contain significant levels of noise. Several systems 
have been developed using Bayesian network concepts. In 
the following system, Scott’s [4] IDS is based on stochastic 
models of user and intruder behavior combined using 
Bayes’ theorem which mitigates the complexity of network 
transactions that have complicated distributions. Intrusion 
probabilities can be calculated and dynamic graphics are 
used to allow investigators to use the evidence to navigate 
around the system. 

B. Neural Networks 

Training Neural networks enable them to modify a state of a 
system by discriminating between classes of inputs. They 
also learn about the relationship between input and output 
vectors and generalize them to extract new input and output 
relationships. They are suitable when identification and 
classification of network activities are based on incomplete 
and limited input data sources. They are able to process 
data from a number of sources, accept nonlinear signals as 
input and need a large sample size of input information. 
Finally, neural networks are not suitable when the 
information is imprecise or vague and it is unable to 
combine numeric data with linguistic or logical data. In the 
following system, Bivens et al. [5] employed the time- 
window method for detection and were able to recognize 
long multi-packet attacks. They were able to identify 
aggregate trends in the network traffic in the preprocessing 
step by looking only at three packet characteristics. Once 
the system is trained and by using the input data, the neural 
network was able to perform real-time detection. 

C. Data Mining 

Data mining refers to a set of techniques that extracts 
previously unknown but potentially useful data from large 
stores system logs. One of the fundamental data mining 
techniques used in intrusion detection is associated with 
decision trees [6] that detect anomalies in large databases. 
Another technique uses segmentation where patterns of 
unknown attacks are extracted from a simple audit and then 
matched with previously warehoused unknown attacks [7]. 
Another data mining technique is associated with finding 
association rules by extracting previously unknown 
knowledge on new attacks and building normal behavior 
patterns [8]. Data mining techniques allows finding 
regularities and irregularities in large input data sets. 



However, they are memory intensive and require double 
storage: one for the normal IDS data and another for the data 
mining. The system of Lee, Solto and Mok’s [7] was able 
to detect anomalies using predefined rules; however, it 
needed a supervisor to update the system with the 
appropriate rules of certain attacks. The rule generation 
methodology developed, first defines an association rule that 
identifies the relation between rules and specifies the 
confidence for the rule. 

D. Agents 

Agents are self contained processes that can perceive their 
environment through sensors and act on the environment 
through effectors. Agents trace intruders and collect input 
information that is related only to the intrusion along the 
intrusion route and then decide if an intrusion has occurred 
from target systems across the network. One of the major 
disadvantages associated with agents is that it needs a highly 
secure agent execution environment while collecting and 
processing input information. It is difficult also to 
propagate agent execution environments onto large numbers 
of third-party servers. Several systems have been developed 
utilizing agents. Spafford and Zamboni [9] introduced 
Autonomous Agents for Intrusion Detection (AAFID) using 
autonomous agents for performing intrusion detection. Their 
prototype provides a useful framework for the research and 
testing of intrusion detection algorithms and mechanisms. 
Gowadia, Farkas and Valtorta [10] implemented a 
Probabilistic Agent-Based Intrusion Detection (PAID) 
system that has cooperative agent architecture. In their 
model agents are allowed to share their beliefs and perform 
updates. Agent graphs are used to represent intrusion 
scenarios. Each agent is associated with a set of input, 
output, and local variables. 

E. Immune Based 

Immune based IDS are developed based on human immune 
system concepts and can perform tasks similar to innate and 
adaptive immunity. In general, audit data representing the 
appropriate behavior of services are collected and then a 
profile of normal behavior is generated. One challenge 
faced is to differentiate between self and non-self data which 
when trying to control causes scaling problems and the 
existence of holes in detector sets. 

There have been several attempts to implement immunity- 
based systems. Some have experimented with innate 
immunity which is the first line of defense in the immune 
system and is able to detect known attacks. For example, 
Twycorss and Aickelin [11] implemented libtissue that uses 
a client/server architecture acting as an interface for a 
problem using immune based techniques. Pagnoni and 
Visconti [12] implemented a native artificial immune system 
(NAIS) that protects computer networks. Their system was 
able to discriminate between normal and abnormal 
processes, detect and protect against new and unknown 
attacks and accordingly deny access of foreign processes to 
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the server. For adaptive immunity two approaches have 
been studied: negative selection and danger theory concepts. 
Kim and Bentley [13] implemented a dynamic clonal 
selection algorithm that employs negative selection by 
comparing immature detectors to a given antigen set. 
Immature detectors that bind to an antigen are deleted and 
the remaining detectors are added to the accepted 
population. If a memory detector matches an antigen an 
alarm is raised. A recent approach to implement adaptive 
immunity uses the danger theory concept [14]. Danger 
theory suggests that an immune response reacts to danger 
signals resulting from damage happening to the cell and not 
only for being foreign or non-self to the body. 

F. Genetic Algorithms 

Genetic algorithms are a family of problem-solving 
techniques based on evolution and natural selection. 
Potential solutions to the problem to be solved are encoded 
as sequences of bits, characters, or numbers. The unit of 
encoding is called a gene, and the encoded sequence is 
called a chromosome. The genetic algorithm begins with 
chromosomes population and an evaluation function that 
measures the fitness of each chromosome. Finally, the 
algorithm uses reproduction and mutation to create new 
solutions. In the system of Shon and Moon [15] the 
Enhanced Support Vector Machine (Enhanced SVM) 
provides unsupervised learning and low false alarm 
capabilities. Profile of normal packets is created without 
preexisting knowledge. After filtering the packets they use a 
genetic algorithm for extracting optimized information from 
raw internet packets. The flow of packets that is based on 
temporal relationships during data preprocessing is used in 
the SVM learning. 

G. Fuzzy Logic 

Fuzzy logic is a system of logic that mimics human decision 
making and deals with the concept of partial truth and in 
which the rules can be expressed imprecisely. Several 
systems have been developed using fuzzy logic. Abrahama 
et al. [16] modeled Distributed Soft Computing-based IDS 
(D-SCIDS) as a combination of different classifiers to 
model lightweight and heavy weight IDSs. Their empirical 
results show that a soft computing approach could play a 
major role for intrusion detection where the fuzzy classifier 
gave 100% accuracy for all attack types using all used 
attributes. Abadeh, Habibi and Lucas [17] describe a fuzzy 
genetics-based learning algorithm and discuss its usage to 
detect intrusion in a computer network. They suggested a 
new fitness function that is capable of producing more 
effective fuzzy rules that also increased the detection rate as 
well as false alarms. Finally, they suggested combining two 
different fitness function methods in a single classifier, to 
use the advantages of both fitness functions concurrently 



H. Expert Systems 

Expert systems-based IDSs build statistical profiles of 
entities such as users, workstations and application 
programs and use statically unusual behavior to detect 
intruders. They work on a previously defined set of rules 
that represent a sequence of actions describing an attack. 
With expert systems, all security related events that are 
incorporated in an audit trail are translated in terms of if- 
then-else rules. The expert system can also hold and 
maintain significant levels of information. However, the 
acquisition of rules from the input data is a tedious and is an 
error-prone process. The system of Ilgun, Kemmerer and 
Porras [18], is an approach to detect intrusions in real time 
based on state transition analysis. The model is represented 
as a series of state changes that lead from an initial secure 
state to a target compromised state. The authors developed 
USTAT which is a UNIX specific prototype of a state 
transition analysis tool (STAT) which is a rule based expert 
system that is fed with the diagrams. In general, STAT 
extracts and compares the state transition information 
recorded within the target system audit trails to a rule based 
representation of known attacks that is specific to the 
system. 

I. Signature Analysis Or Pattern Matching 

In this approach the semantic description of an attack is 
transformed into the appropriate audit trail format 
representing an attack signature. An attack scenario can be 
described, for example, as a sequence of audit events that a 
given attack generates. Detection is accomplished by using 
text string matching mechanisms. Human expertise is 
required to identify and extract non conflicting elements or 
patterns from input data. The system of Kumar’s [19] is 
based on the complexity of matching. Based on the desired 
accuracy of detection, he developed a classification to 
represent intrusion signatures and used different encodings 
of the same security vulnerability. His pattern specification 
incorporated several abstract requirements to represent the 
full range and generality of intrusion scenarios that are: 
context representation, follows semantics, specification of 
actions and representation of invariants. 

J. State Machines 

State machines model behavior as a collection of states, 
transitions and actions. An attack is described with a set of 
goals and transitions that must be achieved by an intruder to 
compromise a system. Several systems have been developed 
using this technique. Sekar et al. [20] employ state-machine 
specifications of network protocols that are augmented with 
information about statistics that need to be maintained to 
detect anomalies. The protocol specifications simplified the 
manual feature selection process used in other anomaly 
detection approaches. The specification language made it 
easy to apply their approach to other layers such as HTTP 
and ARP protocols. Peng, Leckie and Ramamohanarao [20] 
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proposed a framework for distributed detection systems. 
They improved the efficiency of their system by using a 
heuristic to initialize the broadcast threshold and 
hierarchical system architecture. They have presented a 
scheme to detect the abnormal packets caused by the 
reflector attack by analyzing the inherent features of the 
reflector attack. 

K. Petri Nets 

The Colored Petri Nets are used to specify control flow in 
asynchronous concurrent systems. It graphically depicts the 
structure of a distributed system as a directed bipartite graph 
with annotations. It has place nodes, transition nodes and 
directed arcs connecting places with transitions. In the 
system of Srinivasan and Vaidehi [22] a general model 
based on timed colored Petri net is presented that is capable 
of handling patterns generated to model the attack behavior 
as sequence of events. This model also allows flagging an 
attack, when the behavior of one or more processes matches 
the attack behavior. Their use of a graphical representation 
of a timed colored Petri net gives a straightforward view of 
relations between attacks. 

IV. Conclusion 

Choosing an IDS to be deployed in an environment would 
seem to be simple, however, with the different components, 
types and classifications such a decision is quite complex. 
There have been many attempts to classify IDSs as a mean 
to facilitate choosing better solutions. In this paper we 
classified IDSs according to the data processing techniques 
applied to input information. Careful design of an IDS may 
allow correct implementation of an IDS. However, the 
actual merits and limitations of each approach, which is also 
discussed in this paper, indicate that obtaining complete 
security and different desirable system characteristics can 
not be achieved by employing only one type of an 
implementation approach. The data processing techniques 
were grouped into general (abstract) categories and were 
then further expanded into eleven more specialized 
techniques. 

We discussed and summarized the characteristics of each 
technique followed by examples of developed systems using 
each technique. Fig. 1, for example, helps us understand that 
we can use the state machine technique to build an IDS, and 
that we can add intelligence to it and use the expert system 
technique with added merits and costs. The merits are the 
ability to perform and provide intelligent actions and 
answers. Unrealistic actions or answers can be refuted or 
ignored. It also borrows from statistics the ability to detect 
intrusions without prior information about the security flaws 
of a system. Some of the incurred costs are the conflicting 
requirement of maintaining high volume of data which 
affects throughput and selecting the appropriate thresholds 
that lower false positive and negatives. To conclude, 
selecting the appropriate technique should be carried out 
carefully. Each organization should state prior to 



development the requirements of its agency and the 
acceptable costs. Accordingly, the selected system should be 
able to incorporate most of the requirements, as complete 
security can not be achieved. 
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Abstract: Reconfigurable antennas have received significant 
attention for their applications in communications, electronic 
surveillance and countermeasures, by adapting their properties 
to achieve selectivity in frequency, bandwidth, polarization and 
gain. In this paper the design of reconfigurable microstrip 
antenna operating at two different frequency bands will be 
presented. The switching between the different frequency 
bands is achieved by using RF-MEMS switches. 

Keywords- RF-MEMS, reconfigurable, HFSS, microstrip 
patch. 

I. Introduction 

W ith tremendous advancement in technology in the 
field of communication and the increasing consumer 
demands, the need for multifunctional wireless 
communication devices is always felt. Multifunctional 
systems depend on the co-existence of several antennas and 
RF components, but as there number of components 
required in a single system grows, problems such as 
interference, cost, maintainability, reliability and weight etc 
may arise. Multifunctional antennas provides a solution to 
these problems, a multifunctional antenna supports multiple 
functions in a single antenna unit by supporting more than 
one frequency or radiate in different patterns etc. 

In some designs RF, MEMS (Micro-Electro-Mechanical 
Systems), solid- state switches or other technologies are 
used to change the operating frequencies, radiation pattern 
of the antennas, and are usually named as “Reconfigurable 
Antennas”. [1] 

In this paper the design of a reconfigurable antenna 
operating at two different frequencies is presented. The 
operating frequency is switched between two values by 
changing the aperture of the antenna. Operating frequencies 
are chosen as 2.2 GHz and 3.6 GHz, RF-MEMS switch is 
used to change the aperture. 

II. Microstrip patch antenna 



AL = 0 .412 h 
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Since the length has been extended by A □ on each side of 
the patch, the effective length is given by, 
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Patch resonant length L is given by, 

L=Lff-2A 
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Using the values given by TLM approximation, various 
parameters for the antenna were calculated for 3.6 GHz. The 
dielectric substrate chosen here was Rogers RO4032 (□ r = 
3.2) and the height of the substrate h = 1 mm. To feed the 
patch antenna a microstrip feed line can be attached to the 
center of one of the radiating edges. 

The feed line is a 50-Ohm transmission line and the value of 
impedance at the edge of the patch is different, so we 
require a quarter wave transformer to match the impedance 
of feed line with the patch; this procedure is called as 
impedance matching. The impedance Z1 of the matching 
transformer is given by, 

z, = .Jz 0 *r l < 6 > 

where ZO is the impedance of microstrip feed and RL is the 
impedance of the patch. 



The transmission line model (TLM) is used for designing 
the patch antenna. 

The width of the patch is calculated first by, 




where W is the width of the patch, and □ r is the substrate 
dielectric constant. The antenna seems bigger than its 
physical dimensions due to fringing effect. To take this 
effect into account a parameter AL can be computed from 
[3] 




fig.l Patch at 3.6 GHz 
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The width of the transformer is calculated by the formula 

= s e *p( ^ ™ 
h exp( 2 A) -2 W 



where A is given by 
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antenna changes again. And, again we had to readjust the 
dimensions of the patch. 




The calculated dimensions are Width W=20.66mm, Length 
L=20.66mm, Transformer width =0.50mm. These values are 
based on open loop formulas but the simulator is based on 
closed loop formulas, so these values need to be readjusted 
for appropriate results. These were adjusted with the help of 
HFSS and now the corrected dimensions are 
Width=19.51mm, Length=19.51mm, Transformer width 
=0.402mm. 

Similarly the dimension of the microstrip patch operating at 
2.2 GHz was calculated and adjusted. The adjusted 
dimensions are Width W=34.2mm, Length L=34.2imm, 
Transformer width=0.47mm. 




fig. 2 Patch at 2.2 GHz 



hi. Reconfigurable microstrip patch antenna 

DESIGN 

After designing the microstrip patch at 3.6 GHz, to make 
this antenna reconfigurable, a square ring is placed around it 
(as shown in fig. 3) with a separation of 420pm between the 
two. The dimensions of this ring are such as those of the 
patch operating at 2.2 GHz. The two apertures are connected 
using 8 RF-MEMS switches at different positions. 

When we turn these switches ON the two patches are 
connected and the whole structure resonates at 2.2 GHz and 
when the switches are turned OFF only the inner patch 
resonates at 3.6 GHz. The outer ring acts as a parasitic 
element in switch OFF condition. 

The effect of placing this square ring was that in switch OFF 
position it and affects the performance of the antenna. Also 
the isolation provided by the switches in OFF condition 
affected the performance. As a result the resonant frequency 
of the antenna was changed and shifted to a higher value 
than 3.6 GHz, so we need to adjust the effects due to this 
parasitic ring and we did it by increasing the dimensions of 
the inner patch maintaining the separation of 420pm. 

After adjusting the dimensions in switch OFF position, we 
turned the switches ON. In ON condition the switches 
provide insertion loss due to which the parameters of the 



fig. 3 Reconfigurable Antenna 



In switch ON position the problems faced were more than in 
switch OFF position, because we are using the same antenna 
designed for 3.6 GHz (smaller or inner) along with a square 
ring, to work at 2.2 GHz (combined). The dimensions of 
quarter wave transformer used for impedance matching 
remains the same in switch ON as well as in switch OFF 
position, which causes the VSWR to increase in earlier case. 
The final dimensions of the antenna are: 

Switch OFF position: Frequency 3.62 GHz, Length 21.5mm, 
Width 21.5 mm, Transformer width 0.4506mm. 

Switch ON position: Frequency 2.2 GHz, Length 34.2mm, 
Width 34.2 mm. 

Few parameters of the antenna are as shown below 




approx.) 
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fig. 5 Return loss in switch ON position (-27 dB 
approx.) 
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fig. 6 VSWR in OFF state (1.05 approx) 




fig. 8 Switch in OFF state 




fig. 7 VSWR in ON state (1.05 approx.) 

IV. RF-MEMS [4] 

The term RF MEMS refers to the design and fabrication of 
MEMS for RF integrated circuits. By utilizing 
electromechanical architecture on a miniature- (or micro-) 
scale, MEMS RF switches combine the advantages of 
traditional electromechanical switches (low insertion loss, 
high isolation, extremely high linearity) with those of solid- 
state switches (low power consumption, low mass, long 
lifetime). While improvements in insertion loss (< 0.2 dB), 
isolation (> 40dB), linearity (third order intercept point > 
66dBm), and frequency bandwidth (dc-40GHz) are 
remarkable, the RF MEMS switches are slower and have 
lower power handling capabilities. 

The switch used in our model is based on two; 1 pm thick 
gold cantilevers with dimensions of 150x80 pm2 with a 
central conductor (60x40 pm2) joining the two side 
cantilevers. The cantilever is suspended 101 pm above the 
substrate and 1pm above the patch. The anchors supporting 
the structure are 4x4 pm2 of gold and 101pm thick ( in 
switch OFF). A gold hinge (4x30 pm2) joins the anchor 
with the side cantilever. The metal strip (T-line) joining the 
two patches is 40 pm wide, there is break of 40 pm in the 
mid of the T-Line, such that the central conductor of the 
switch is placed just above it. 

To bring the switch in ON state, the gap between the switch 
and the patch is eliminated by reducing the height of the 
components of the switch by 1pm, and a voltage of 10V is 
also applied from the actuation area; defined below the side 
cantilever. In actual this voltage pulls the cantilever down so 
that the switch is closed. 




When the switch is turned ON, the central conductor fills the 
gap and acts as a bridge for the current to pass through the 
T-Line. In switch OFF position the break is maintained and 
there is no path for the current to pass. 

The two parameters of the switch: namely insertion loss and 
isolation are as shown below 




fig. 10 Sll(blue), S21(red) in OFF state 




V. Discussion and conclusions 



The dual frequency reconfigurable antenna element is 
suitable for use in communication applications. However the 
reliability and life-time problems should also be addressed 
before commercial use. 

The design of a frequency reconfigurable antenna by using 
the ON/OFF state properties of a low voltage-actuated 
MEMS switch is presented in this paper. The two 
parameters of the switch namely Insertion loss and Isolation 
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affect the performance of the antenna and hence have to be 
taken care of. Probant experimental verifications have been 
made by using a 420x40pm2 small piece of copper to model 
the ON state of the MEMS switch. The results were 
compared with theoretical ones, for the structure good 
agreement between simulated and experimental results is 
achieved. 

Use of good quality RF MEMS switches, optimizing their 
location and performance can improve the working of the 
antenna. 

Here reconfiguration in only one parameter is 
presented; we can also go for reconfiguration in other 
parameters such as radiation pattern or polarization. 
Reconfiguration in more than one parameter using a single 
antenna is also possible but it greatly increases the 
complexity of the design. 
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Abstract- The basic idea proposed in this paper is to 
determine the Optimal Congestion Window for a TCP Sender 
in a particular network set-up (that corresponds to the fair 
share of that connection) and keep this congestion window a 
constant to a point where the fair share in the network has 
changed considerably from the instance of the calculation of 
the size of the last window. At this point, the TCP Congestion 
Window is recalculated according to the nature of new 
circumstances. The proposed mechanism is particularly 
effective over wireless links, which have an inherently loss- 
prone nature, as Modified TCP’s congestion window being 
independent of packet losses (be it corruption losses or it 
congestion losses), keeps transmitting at the same rate at 
before. 

I. Introduction 

T he well-known challenge in providing TCP congestion 
control algorithm [1], [2], [12] in wired - cum - 
wireless environment is that it relies on the packet loss as an 
indicator of network congestion. In order to ease the 
congestion scenario and to avoid a congestion collapse, a 
TCP Reno Sender reduces the congestion window 
(henceforth referred to as cwnd and expressed in number of 
segments) and refrains from sending packets. In the wired 
portion of the network, a congested router is invariably the 
likely reason of packet loss, while in the wireless portion a 
noisy, fading radio channel is the more likely cause of loss. 
This creates problems in TCP Reno since it does not possess 
the capability to distinguish and isolate congestion loss from 
wireless loss. Approaches to address this problem have been 
discussed and compared in the work by Balakrishnan et al. 
[3]— [4]. Three alternative approaches: end-to-end (E2E), 
Split Connection, and Localized Link Layer methods were 
carefully contrasted. 

The split-connection approach [13] -[14] violates the 
semantics of E2E reliability. Secondly, this approach 
requires a lot of state maintenance at the base station. 

In this paper, we propose a TCP Sender side modification of 
the TCP congestion control algorithm [5]. The crux of idea 
is that for a given network scenario, the Modified TCP 
Sender determines its optimal fair share of bandwidth in the 
link setting its cwnd in a way that it can effectively transmit 
with a rate that utilizes the fair share of bandwidth. After the 
cwnd is set to a value optimal for a given network scenario, 
it is kept constant to the point where the network scenario 



has changed by a extent significantly altering the 
connection’s fair share. Since the value of cwnd is not 
decreased at any packet loss indication like retransmission 
on receipt of a triple DUPKT, or a coarse timeout caused by 
the expiration of the Retransmission Timer, hence it is not 
susceptible to performance degradation and cwnd reduction 
on the occurrences of stray packet losses. This leads to an 
enhanced performance in the wireless domain, as the losses 
are never an indication of congestion, rather they are caused 
due to the inherent loss-prone nature of the radio 
propagation medium. We provide simulation results in 
support to our claim that constant cwnd can outperform TCP 
Reno in static (i.e., certain time interval) network scenarios. 
The rest of this paper is organized as follows: section 2 
summarizes some related work; section 3 gives the 
analytical approach; section 4 describes the algorithm used 
by the sender; section 5 summarizes the results obtained by 
the simulations; section 6 gives an idea of the challenges 
faced while implementing such a strategy; and finally, 
section 7 concludes the paper. 

ii. Related work 

NCPLD [15] compares the measured rtt with the lowest rtt 
(or that at the knee of the goodput - load curve). If the 
former is close to the latter, then the cause of a packet loss is 
assumed to be wireless errors. TCPW [8]— [1 1] measures 
goodput (or reception rate) and uses that rate to set the 
congestion window whenever a packet is detected lost. If the 
current goodput is below a certain band around the mean, 
then the cause of a packet loss is assumed congestion, 
otherwise the cause of loss is attributed to wireless errors. 
This paper uses the TCPW bandwidth estimation scheme 
and compares the performance of the Modified sender with 
the TCPW sender. The TCPW [8]— [1 1] sender monitors 
ACKs to estimate the bandwidth currently used by, and thus 
available to the connection. More precisely, the sender uses 
(1) the ACK reception rate and (2) the information an ACK 
conveys regarding the amount of data delivered to the 
destination. The Westwood algorithm is described briefly 
below. 

Let us assume that an ACK is received at the source at time 
tk, notifying that dk bytes have been received at the TCP 
receiver. We can measure the following sample bandwidth 
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used by that connection as bk = dk/Ak, where Ak = tk-tk-1 
and tk— 1 is the time the previous ACK was received 
The following discrete-time filter is used which is obtained 
by discretizing a continuous low-pass filter using the Tustin 
approximation 

b' k = Ofcb'k.i + (1 - OfcXbj, + b fc _ 0/2 

where is the filtered estimate of the available 

bandwidth at time t = l k , a k = (If- AjjA (2x + AJ, 
and 1/T is the cutoff frequency of the filter. 

Algorithm after n duplicate ACKs 

The pseudo code of the algorithm is the following: 

if(n DUPKIs are received) 

ssthresh - (BWE * RTTmin)/seg_size; 
if (cwin > ssthresh) /* congestion avoid */ 
cwin = ssthresh; 

endif 

VWWW VWV 



Here, seg_size identifies the length of the payload of a TCP 
segment in bits. 



Algorithm after coarse timeout expiration 
The pseudo code of the algorithm is: 

if (coarse timeout expires) 

$sthresh — (BWE * RTTminJ/seg size; 
if (ssthresh <2) 

ssthresh = 2; 

endif 
cwin = l; 
endif 



hi. Analytical approach 

The logic for using a constant window would be 
summarized as under: 

As in [1] if we measure the network load by average queue 
length over fixed intervals of some appropriate length, and 
L i be the load at instant i, then, for a congested network we 
have: 

L^N + fLu (1) 

where N (a constant) accounts for the average arrival rate of 
the new traffic, and yLi-1 accounts for the traffic left from 
the last time interval. Evidently, the term yLi-1 arises when 
the sender is sending at a rate which is greater that its fair 
share leading to a fraction of packets from the previous 
round remaining in the network when the packets form the 
next round arrives in the network. But if the sender is 
sending at a rate that utilizes its fair share, the yLi-1 
vanishes; equation (1) thereby reduces to 

L, = N ' (2) 

which is a constant, and this forms the basis for use of a 
constant congestion window. 



IV. TCP MODIFICATIONS 

The key idea here [5] is that we can divide the entire 
lifetime of a TCP connection into a finite number of slots 
such that the connection’s fair share in the network remains 
almost same in a particular slot, i.e. we may assume that the 
network scenario remains almost static with such slot. A 
change in the available share of a network, due to some 
connections leaving the network or some new connections 
joining, ends a slot and marks the beginning of the next slot. 
Our proposal is to use a constant TCP Congestion Window 
during these slots where the network scenario is assumed to 
remain unchanged. The beginning of a new slot would 
trigger a window recalculation and the cwnd would be set 
according to the connection’s available share in that slot. 

In the proposed mechanism, we use a bandwidth estimation 
algorithm similar to that of TCPW to obtain an estimate of 
the available fair share. The change in the rtt measurements 
is used as a trigger to move to the recalculation phase from 
the constant window phase (our model uses the knee region 
in the rtt curve as in [15] to detect a change in fare share and 
trigger recalculation). In our model, the Modified TCP 
Sender moves through three distinct phases during its 
lifetime: the startup phase followed by mutually interleaved 
window recalculation phase and constant window phase. 
The three phases are described with some detail as under. 

A. The Startup Phase 

At connection setup, the sender has no inkling of the 
network scenario. In order to impart dynamic nature, the 
Sender refrains from using typical default values for these 
essential attributes of the connection. The sender uses a slow 
start mechanism as in [1]. The sender continues the slow 
start process for say k rounds, during which it acquires 
various vital information about the network such as the 
minimum rtt measurement, a measure of the network 
bandwidth that the connection etc. After the first k rounds, 
the sender has acquired enough information about the 
network and hence calculates cwnd for the first time. 

B. Window Recalculation Phase 

When a change in available fair share is detected by the 
trigger, the TCP sender enters this phase. This is the most 
crucial phase of the connection, as in this phase, the cwnd is 
calculated which is kept a constant during the next phase. 
Hence the performance of the sender, how well it utilizes its 
share of the network, depends on the cwnd calculated. 
Along with the window recalculation process, the current 
value of the smoothed rtt measurements, obtained by 
passing the coarse rtt measurements of the individual 
segment through a low pass filter as suggested by Jacobson 
[1], is also archived for future reference. 

An efficient Bandwidth Estimation Algorithm must be in 
place to determine the fair share of the connection in the 
network. The accuracy of this algorithm in determining the 
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network share would determine the performance of the 
Modified TCP Sender. 

C. The Constant Window Phase 

During this phase of the connection, the cwnd is kept a 
constant irrespective of the number of ACKs received or any 
indications of packet loss like DUPKT or a coarse timeout. 
The sender keeps track of the rtt estimates from the 
segments that have been delivered. If the percentage change 
in the smoothed rtt measurements over the archives rtt 
measure is greater than a specified threshold, the sender 
exits the constant window phase and enters the Window 
Recalculation Phase i.e. if Irtt arc — rtt var l/r tt arc > P , a window 
recalculation is made. 

The algorithm’s pseudo code is as follows 



iff s low^ tart js tat e) 

sjow^jtartf) ; /* open cwnd by one segment on 
each ACK amxal */ 
else 
{ 

zf(] rtt arc -rttvarl /rtt arc >y^ /* fractional increase 
greater than threshold */ 

{ 

/* recalculate window and archive the 

value of */ 

cwnd_ = (Estimated B * rtt 

/ seg_size_; 

if (cwnd _ < 1) cwnd_=l; 

ZMjztx. ~ (tCm > 

} 

} 



In the pseudo code, seg_size_ identifies the length of the 
TCP segments in bytes; rttmin is the estimated minimum 
value of rtt throughout the lifetime of the particular 
connection, and Estimated Bandwidth is the Bandwidth 
Estimate obtained by some Bandwidth Estimation 
Algorithm. 

V. Performance analysis 

In this section, we report on the basic performance behavior 
of the modified TCP senders and its fairness among a 
number of connections sharing a bottleneck link. A 
performance comparison is made with the TCP Reno and 
TCP Westwood [8]-[ll] sources operating in similar 
network scenarios. Intermediate node buffer capacity is 
always set equal to the bandwidth delay product for the 
bottleneck link based on literature studied. Increasing the 
buffer capacity further does not have any impact on the 
performance [15]. The traffic model used is FTP with 
infinite data to send so that the sender has data to send 
whenever the network permits, and the packet size is set to 
1000 bytes (1040 bytes with headers) in all experiments. 
The wireless subnet is error prone. In our simulations we 
have used the conventional TCP Sink which responds with 
an ACK for every packet received. There is no congestion or 
error in the ACK path. All simulations have been carried out 
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for a period of 250 seconds with the TCP senders 
transmitting data for the entire period of simulation. All the 
simulations have been carried out with 802. 1 1 MAC with a 
maximum available bandwidth of 1Mbps. A Two Ray 
Ground propagation model is used with an Omni-directional 
antenna. The wired subnet is error free while the wireless 
subnet is prone to varying error rates. 




Figure 1 : Network Scenario used for simulation 
The performance of the Modified TCP Senders, TCP 
Westwood, and TCP Reno has been compared based on the 
throughput metric, i.e. the number of data packets received 
at the sender. We have analyzed the performance of the 
Constant Congestion Window aspect of the Modified TCP 
to assert that in the time slots when the share of a connection 
in the network remains unchanged, a Constant cwnd TCP 
outperforms the Reno and Westwood sources in situations 
with wireless errors. Our assumption is that the share of a 
connection remains unchanged during the entire period of 
simulation. The optimal cwnd for a given scenario has been 
evaluated the cwnd of the modified TCP Sender has been set 
accordingly. One aspect is to be noted that we are not 
simulating the entire lifetime of a modified TCP sender. 
Rather, our analysis is concentrated only on the Constant 
Congestion Window phase of the connection. 

All simulations in this paper have been carried out using the 
LBL network simulator ns2 [6], [7] with appropriate 
modifications for implementation of the changes in the 
modified TCP sender. For comparison with TCP Westwood, 
the corresponding TCPW modules were used [16]. 

Figure 1 shows a schematic of the scenario used for 
simulation. A number of TCP connections share a common 
wired bottleneck that connects the intermediate router to the 
base station. When there is only one TCP connection in the 
network, there is no loss due to congestion. As a result, any 
packet loss is due to wireless errors. Hence, we can evaluate 
the performance of the Modified congestion control 
algorithm in scenarios where wireless loss is the only cause 
for packet loss. As the numbers of source/receiver pairs are 
increased, gradually the wired link between the router and 
the base station would become congested. Hence packets 
will also be lost both due to congestion as well as wireless 
errors. Hence, the performance of the Modified TCP sender 
in congested networks can also be evaluated using the same 
scenario. 
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A. Constant Bit Error rates 

In the scenarios under consideration, the wireless subnet is 
prone to constant bit error rates. Figure 2 compares the 
performance of the Constant cwnd senders for different 
values of cwnd. As is evident, for every scenario, there 
exists a value of cwnd (in some cases more than one) for 
which the performance of the TCP Sender is maximum. 
This is the optimal cwnd for the given network scenario. In 
figure 2, cwnd is expressed in segments. 

As is evident from figure 3, a Constant cwnd TCP 
outperforms the Reno and Westwood senders operating in 
similar network conditions. A 10-15% increase in 
throughput has been obtained as is evident from figure 3. In 
figure 3, the error rates are expressed as percentage. Figure 
4 compares the performance of the TCP variants for 
multiple connections sharing the wired bottleneck and 
hence, packet is lost due to congestion as well. The Constant 
cwnd TCP sender outperforms Reno and Westwood in such 
scenarios as well. 




Figure 2: Variation of Throughput with varying cwnd for 
various bit error rates for single S/R pair. 




Figure 3: Variation of throughput with varying error rates in 
scenarios with constant bit error rates for single S/R pair 




Figure 4: Variation of Throughput with number of TCP 
connections sharing the link in scenarios with 5% loss in 
constant bit error 




Figure 5: Variation of Throughput with varying cwnd error 
for various burst error for single S/R pair 

B. Burst Error 



This subsection compares the performance of Modified 
TCP, TCP Reno and TCP Westwood based on the 
throughput metric. In the scenarios under consideration, the 
wireless subnet is prone to burst error. The burst error is 
modeled using is a discrete time first order Markov Model. 
The pattern of errors is described by the transition matrix 



M= 



PBB PBG 
PGB PGG 



Where is p BG the transition from bad to good, i.e., the 
conditional probability that successful transmission occurs 
in a slot given that a failure occurred in the previous slot, 
and the other entries in the matrix are defined similarly. It is 
to be noted that represents 1/(1 - p BB ) the average length of a 
burst of errors, which is described by a geometric random 
variable. 

Figure 5 compares the performance of the Constant cwnd 
senders for different values of cwnd. As is evident, for every 
scenario, there exists a value of cwnd (in some cases more 
than one) for which the performance of the TCP Sender is 
maximum. This is the optimal cwnd for the given network 
scenario. In figure 5, cwnd is expressed in segments. 

When comparing the performance of the Constant cwnd 
TCP Sender, the cwnd is set to the optimal value for the 
given scenario. As is evident from figure 6, a Constant cwnd 
TCP outperforms the Reno and Westwood senders operating 
in similar network conditions. A 10-20% increase in 
throughput has been obtained as is evident from figure 6. In 
figure 6, the error rates are expressed as percentage. 

Figure 7 compares the performance of the TCP variants for 
multiple connections sharing the wired bottleneck and 
hence, packet is lost due to congestion as well. The Constant 
cwnd TCP sender outperforms Reno and Westwood in such 
scenarios as well. 




Figure 6: Variation of throughput with varying error rates in 
scenarios with burst error and for single S/R pair 
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Figure 7: Variation of Throughput with number of TCP 
connections sharing the link in scenarios with 5% loss in 
burst error 

VI. Challenges in implementing the proposed 

MECHANISM 

In the earlier sections of the paper, we have proposed Sender 
side modification of the TCP congestion control algorithm. 
There are certain challenges, which need to be met in order 
for this mechanism to work more efficiently. Firstly, the 
bandwidth determination algorithm would be precisely able 
to calculate the available fair share of the connection in the 
network. An incorrect estimation would negate the 
performance enhancement, which would be gained by not 
reducing the window in case of wireless errors. Secondly, 
the triggering mechanism would be able to efficiently 
determine a change in the available fair share of the 
bandwidth in the network. Failure to do so would lead to 
potential over or under utilization of the available fair share 
in case the fair share of the connection decreases or 
increases respectively. 

VII. Conclusion and future work 

In this paper, we propose a sender side modification of the 
TCP congestion control algorithm. In addition to this 
proposal, we have evaluated and compared the performance 
of the modified TCP sender during a particular phase of its 
lifetime viz. the Constant Congestion Window Phase. The 
simulations performed has shown a throughput enhancement 
of 10-15% as compared to TCP Reno and Westwood in 
cases with constant bit error rates and about 10-20% in cases 
with burst error corresponding to a discrete time first-order 
Markov model. 

One important aspect of operation of this modified TCP is 
that the cwnd should be set to a value optimal for a given 
connection. For this purpose, an efficient Bandwidth 
Estimation Algorithm would be designed that would 
dynamically determine a connection’s fair share based on 
certain observed and measured parameters. We are working 
on to derive a function that would dynamically determine 
the cwnd during the window recalculation phase. 
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Abstract- This paper is a survey of recent work in the field of 
web recommendation system for the benefit of research on the 
adaptability of information systems to the needs of the users. 
This issue is becoming increasingly important on the Web, as 
non-expert users are overwhelmed by the quantity of 
information available online, while commercial Web sites strive 
to add value to their services in order to create loyal 
relationships with their visitors -customers. This article views to 
provide a remedy for the negative effects of the traditional one- 
size-fits-all approach is to enhance the system's ability to adapt 
its own behavior to the user’s characteristics, such as goals, 
tasks, interests, that are stored in user profiles by 
implementing a variety of algorithms. The enormous content of 
information on the World Wide Web makes it obvious 
candidate for Web Recommendation System research. Web 
based application facing with large amount of data. In order to 
produce the portal usage patterns and user behaviors, Web 
recommendation system consists of three main phases, namely 
Data Preprocessing, Pattern Discovering and Pattern Analysis. 
Server log files become a set of raw data where it must go 
through with all the Web recommendation system phases to 
produce the final results. Here, Web recommendation system, 
approach has been combining with the basic Association Rules, 
Apriori Algorithm to optimize the content of the E-application 
portal. Finally, this paper will present an overview of results 
analysis and can use the findings for the suitable valuable 
actions. 

I. Introduction 

A n abundant amount of information is created and 
delivered over electronic media. Users risk becoming 
overwhelmed by the flow of information, and the users lack 
adequate tools to help them manage the situation. 
Information filtering (IF) is one of the methods that are 
rapidly evolving to manage large information flows. The 
aim of IF is to expose users to only information that is 
relevant to them. Many IF systems have been developed in 
recent years for various application domains. Information 
filtering systems can help users by eliminating the irrelevant 
information and by bringing the relevant information to the 
user's attention. Filters are mediators between the sources of 
information and their end-users. 

The system is based on a user modeling component [21], 
designed for building and maintaining long term models of 
individual Internet users. Presently the system acts as an 
intelligent interface for the Web search engines. The 
experimental results we have obtained are encouraging and 
support the choice of adaptive Information Filtering. Its 
main goal is the management of the information overload 



and increment of the semantic signal-to-noise ratio. To do 
this the user's profile is compared to some reference 
characteristics. These characteristics may originate from the 
information item (the content-based approach) or the user's 
social environment (the collaborative filtering 
approach).Whereas in information transmission electronic 
filters are used against syntax-disrupting noise on the bit- 
level, the methods employed in information filtering act on 
the semantic level. The range of machine methods employed 
builds on the same principles as those for information 
extraction [1]. A notable application can be found in the 
field of email spam filters. Thus, it is not only the 
information explosion that necessitates some form of filters, 
but also inadvertently or maliciously introduced pseudo- 
information. 

The different systems use various methods, concepts, and 
techniques from diverse research areas like: Information 
Retrieval, Artificial Intelligence, or Behavioral Science. 
Various systems cover different scope; have divergent 
functionality, and various platforms. There are many 
systems of widely varying philosophies, but all shares the 
goal of automatically directing the most valuable 
information to users in accordance with their User Model, 
and of helping them use their limited reading time most 
optimally. 

When a user interacts with the system for the first time, the 
user model needs to be made from scratch. In order to 
quickly build a reliable model an interview is proposed to 
the user, expressing an interest score for each of the domain 
categories. The user sets a query to the system that in turn 
posts it to the external WWW search engine, obtaining 
documents that are filtered and returned to the user. In the 
filtering process the systems works using two different 
levels of refinement, a first, coarse one, and a more 
elaborate step that takes place only if the first stage 
succeeds. During the normal usage the system offers a series 
of panels, being the first the filtering panel [19]. Here at the 
left is shown the list of documents retrieved by the search 
engine given the user query. 

For an easier usage the system automatically sorts the 
document lists so to help the user locating the best 
documents. The user browses the needed documents by 
double-clicking on them, and then he can express a simple 
feedback [15] among three different values: very good, good 
or bad, in order to ease the burden on the user as 
recommended. In this way the system can modify the user 
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model accordingly to user's preferences [3 and 4]. 
Furthermore, a system objects browser has been provided in 
order to allow the user to inspect all the system's data 
structures with an effective graphical interface to shorten the 
semantic gap between the user and the system. In the next 
section the user modeling component is presented. 



II. Recommendation systems 




Recommender systems are active information filtering 
systems that attempt to present to the user information items 
(movies, music, books, news, web pages) the user is 
interested in. These systems add information items to the 
information flowing towards the user, as opposed to 
removing information items from the information flow 
towards the user. Typically, a recommender system 
compares the user's profile to some reference characteristics, 
and seeks to predict the rating that a user would give to an 
item they had not yet considered [20]. Recommender 
systems use collaborative filtering approaches or a 
combination of the collaborative filtering and content-based 
filtering approaches, although content-based recommender 
systems do exist [7]. 

Web-based Recommender Systems (RS) are recently 
applied to provide different type of customized information 
for their users. The Recommender Systems are applied in 
many areas such as: web-browsing, information filtering, 
net-news or movie recommender and e-Commerce. The 
central element of all recommender systems is the user 
model that contains knowledge about the individual 
preferences which determine his or her behavior in a 
complex environment of web-based systems. User 
modelings as well as RS are characterized by cross - 
fertilization of various research fields such as: Information 
Retrieval, Artificial Intelligence, Knowledge 
Representation, Discovery and Data/Text Mining, 
Computational Learning and Intelligent and Adaptive 
Agents. The alternating information environment that is 
combined of various users, their needs and contexts of use 
as well as different system platforms necessitates application 
of recommender systems. 

The ever increasing importance of the e-Commerce in the 
global economy also increases the importance of web-based 
RS’s. RS systems are developed by different domains such 
as personal agents and adaptive hypermedia. The 
personalized hypermedia application is defined as a 
hypermedia system that adapts: the content, structure, and/or 
presentation of the web objects to each individual user’s 
model. RS’s are applied in many different areas from web 



browsing for purchase recommendation. Montaner et. al in 
their work present comprehensive taxonomy of the 
recommender agents. In this taxonomy the following two 
dimensions are considered: profile generation and 

maintenance, and profile exploitation. The dimension of 
profile generation and maintenance considers the following 
elements: user profile representation, initial profile 

generation, profile learning technique and relevance 
feedback [22]. 

III. Classification of recommendation system 

Many groups have built various types of systems that 
recommend pages to web users. This section will summarize 
several of those systems, and discuss how they differ from 
our approach. The objective of collecting user information is 
to create a profile that describes user characteristics. The 
more common techniques are explicit profiling, implicit 
profiling, and use of legacy data: 

Explicit profiling: Each user is asked to fill in a form when 
visiting the web site. This method has the advantage of 
letting users specify directly their interests. 

Implicit profiling: The user’s behavior is tracked 

automatically by the system. This method is generally 
transparent to the user. Often, user registration is saved in 
what is called a cookie that is kept at the browser and 
updated at each visit. Behavior information is generally 
stored in a log file. 

Legacy data: The Legacy data provides a rich source of 
profile information for known users. 

IV. A SURVEY ON WEB RECOMMENDATION 

Piatetsky- Shapiro et. al., discusses in [5] personalization is a 
process of gathering and storing information about visitors 
of a web site, analyzing the stored information, and, based 
on this analysis, delivering the right information to each 
visitor at the right time. A personalization component should 
be capable to recommend documents and/or other web sites, 
promote products, make appropriate advice, target e-mail, 
etc. Personalization is increasingly used as a mean to 
expedite the delivery of information to a visitor, making the 
site useful and attractive so that the visitor is stimulated to 
return to it. For this, personalization is one of the e-business 
web sites. 

A personalization component builds and exploits models or 
profiles of the users interacting with the system. A user 
profile is a (possibly structured) representation of 
characteristics of that user, in order to take into accounts his 
or her needs, goals, and interests. 

A. Recommendation System Using Apriori Algorithm 

R. Agrawal et. al., discusses in [2] that recommendation 
system using apriori algorithms a classic algorithm for 
learning association rules [13]. Apriori is designed to 
operate on databases containing transactions (for example, 
collections of items bought by customers, or details of a 
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website frequentation). Other algorithms are designed for 
finding association rules in data having no transactions 
(Winepi and Minepi), or having no timestamps (DNA 
sequencing). 

As is common in association rule mining, given a set of 
itemsets (for instance, sets of retail transactions, each listing 
individual items purchased), the algorithm attempts to find 
subsets which are common to at least a minimum number C 
of the item sets. Apriori uses a bottom up approach, where 
frequent subsets are extended one item at a time (a step 
known as candidate generation), and groups of candidates 
are tested against the data. The algorithm terminates when 
no further successful extensions are found. 

B. Rule-Based Techniques 

Rule-based techniques exploit a set of rules specified in the 
system in order to drive personalization. Cross-selling is an 
e-business example of the rule-based technique: a rule could 
be specified to offer product X to a customer who has just 
bought product Y. For example, a customer of a book might 
be interested in current or previous books by the same 
author or in books on the same subject. 

C. Item-Based Collaborative Filtering 

Bardul M. Sarwar et. al., projected a different approach in 
the area of filtering algorithms, that was suggested recently 
[28] [29], is based on item relations and not on user 
relations, as in classic Collaborative Filtering. In the Item- 
based Collaborative Filtering algorithm, we look into the set 
of items, that the active user, has rated, compute how similar 
they are to the target item and then select the k most similar 
items {il, i2, ..., ik}, based on their corresponding 
similarities { sil, si2, ..., sik}. The predictions can then be 
computed by taking a weighted average of the active user’s 
ratings on these similar items. The first step in this new 
approach is the Representation. Its purpose is the same as 
with the classic Collaborative Filtering algorithm: represent 
the data in an organized manner. 

The Item Similarity Computation should be calculated. The 
basic idea in that step is to first isolate the users who have 
rated two items ij and ik and then apply a similarity 
computation technique to determine their similarity. Various 
ways to compute that similarity have been proposed. 

D. Content -Boosted Collaborative Filtering 

Emmanouil G. Vozalis et. si., estimated in [31] the basic 
idea behind Content-Boosted Collaborative Filtering is to 
use a content-based predictor to enhance existing user data, 
expressed via the user-item matrix, R, and then provide 
personalized suggestions through collaborative filtering. The 
content-based predictor is applied on each row from the 
initial user-item matrix, corresponding to each separate user, 
and gradually generates a pseudo user-item matrix, PR. At 
the end, each row, i, of the pseudo user-item matrix PR 



consists of the ratings provided by user u i? when available, 
and those ratings predicted by the content-based predictor. 

Memory-based filtering algorithms include the basic 
Collaborative Filtering algorithm [30], Item-based 
Collaborative Filtering [28] and the Algorithm using 
SVD/LSI for Prediction Generation [31]. Correlation-based 
vs. Machine Learning based algorithms Billsus and Pazzani 
attempt [24], through their work described in [32], to 
transform the formulation of the recommendation problem, 
as viewed by the classic Collaborative Filtering algorithm, 
into a Machine Learning problem, where any supervised 
learning algorithm can be drawn and applied. They are 
based on the assumption that while correlation based 
approaches seems to work well in the specific domain. 

E. The Weighted Combination of Content-based and 
Collaborative 

Filtering defines two distinct filtering components. The first 
component implements plain Collaborative Filtering, while 
the second component implements Content based Filtering. 
The final rating prediction is calculated as a weighted sum 
of those components, where the applied weights are decided 
by how close is the prediction of each component to the 
actual rating. 

V. Present scenario of research in 

RECOMMENDATION SYSTEM 

Recommender systems have been evaluated in many, often 
incomparable, ways. In the present scenario the user tasks 
being evaluated, the types of analysis and datasets being 
used, the ways in which prediction quality is measured, the 
evaluation of prediction attributes other than quality, and the 
user-based evaluation of the system as a whole. In addition 
to reviewing the evaluation strategies used by prior 
researchers, we present empirical results from the analysis 
of various accuracy metrics on one content domain where all 
the tested metrics collapsed roughly into three equivalence 
classes. Metrics within each equivalency class were strongly 
correlated, while metrics from different equivalency classes 
were uncorrelated. 

Dimensions for User Evaluation 

Explicit (ask) vs. implicit (observe) A basic distinction is 
between evaluations that explicitly ask users about their 
reactions to a system and those that implicitly observe user 
behavior. The first type of evaluation typically employs 
survey and interview methods. The second type usually 
consists of logging user behavior, then subjecting it to 
various sorts of analyses. 

vi. Challenging problems in recommendation 

SYSTEM 

The several current challenges of the recommender systems 
are considered in this section. The first set of challenges 
concerns issues of bringing people together into 
communities of interest. A major concern here is respecting 
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people’s privacy. The second challenge is to create 
recommendation algorithms that combine multiple types of 
information, probably acquired from different sources at 
different times. 

Establishing the user tasks to be supported by a system, and 
selecting a data set on which performance enables empirical 
experimentation - scientifically repeatable evaluations of 
recommender system utility. A majority of the published 
empirical evaluations of recommender systems to date has 
focused on the evaluation of a recommender system’s 
accuracy. We assume that if a user could examine all items 
available, they could place those items in a ordering of 
preference. Accuracy metric empirically measures how 
close a recommender system’s predicted ranking of items 
for a user differs from the user’s true ranking of preference. 
Accuracy measures may also measure how well a system 
can predict an exact rating value for a specific item. 
Researchers who want to quantitatively compare the 
accuracy of different recommender systems must first select 
one or more metrics. In selecting a metric, researchers face a 
range of questions. Will a given metric measure the 
effectiveness of a system with respect to the user tasks for 
which it was designed? Are results with the chosen metric 
comparable to other published research work in the field? 
Are the assumptions that a metric is based on true? Will a 
metric be sensitive enough to detect real differences that 
exist? How large a difference does there have to be in the 
value of a metric for a statistically significant difference to 
exist? Complete answers to these questions have not yet 
been substantially addressed in the published literature. 

The challenge of selecting an appropriate metric is 
compounded by the large diversity of published metrics that 
have been used to quantitatively evaluate the accuracy of 
recommender systems. This lack of standardization is 
damaging to the progress of knowledge related to 
collaborative filtering recommender systems. With no 
standardized metrics within the field, researchers have 
continued to introduce new metrics when they evaluate their 
systems. With a large diversity of evaluation metrics in use, 
it becomes difficult to compare results from one publication 
to the results in another publication. As a result, it becomes 
hard to integrate these diverse publications into a coherent 
body of knowledge regarding the quality of recommender 
system algorithms. 

vii. Future 

Recommender systems are a powerful new technology for 
extracting additional value for a business from its customer 
databases. The systems help customers find products they 
want to buy from a business. Recommender systems benefit 
customers by enabling them to find products they like. 
Conversely, they help the business by generating more sales. 
Recommender systems are rapidly becoming a crucial tool 
in E-commerce on the Web. Recommender systems are 
being stressed by the huge volume of customer data in 
existing corporate databases, and will be stressed even more 
by the increasing volume of customer data available on the 
Web. New technologies are needed that can dramatically 
improve the scalability of recommender systems. 



Web recommendation system is seen as a fully automated 
process, powered by operational Knowledge. A number of 
systems following many approaches have been developed, 
using methods and techniques from Web recommendation 
system. In addition to the functions employed by existing 
systems, many other interesting ones have been neglected so 
far. The combination of recommendation and customization 
functionality has been seen as the main solution to the 
information overload problem and the creation of loyal 
relations between the Web site and its visitors. However, 
other functions such as task performance support and user 
tutoring can certainly improve the experience of a Web site 
visitor. It should be noted at this point, that Web 
recommendation is a very active research field and new 
approaches related to its application appear on a regular 
basis. As a result, there are a number of unsolved technical 
problems and open issues. Some of these have been 
presented in this survey. New techniques and possibly new 
models for acquiring data are needed. One serious issue 
concerning data collection is the protection of the user’s 
privacy. A poll by KDnuggets (15/3/2000 to 30/3/2000) 
revealed that about 70% of the users consider Web 
recommendation as a compromise of their privacy. Thus, it 
is imperative that new tools are transparent to the user, by 
providing access to the data collected and clarifying the use 
of these data, as well as the potential benefits for the user. 
At the same time, one should be very careful not to burden 
the user with long-winded form-filling procedures, as these 
discourage users from accessing a Web site. Even the simple 
process of user registration is unacceptable for some Web- 
based services. 

In addition to the various improvements to the Web 
recommendation system process, there are a number of other 
issues, which need to be addressed in order to develop 
effective Web personalization systems. From the open 
issues that were mentioned in this survey, the treatment of 
time in the user models can be distinguished as being 
particularly difficult. The main source of difficulty is that 
the manner in which the behavior of users changes over time 
varies significantly with the application and possibly the 
type of the user. Therefore, any solution to this problem 
should be sufficiently parametric to cater for the 
requirements of different applications. It is therefore evident 
that the integration of Web recommendation system using 
apriori algorithm has introduced a number of 
methodological and technical issues, some of which are still 
open. At the same time the potential of this synergy between 
the two processes has barely been realized. As a result, a 
number of interesting directions remain unexplored. This 
survey has identified promising directions, providing at the 
same time a vehicle for exploration, in terms of Web 
recommendation system tools and methods. 

VIII. Conclusions 

Web using recommendation system is an emerging 
technology that can help in producing personalized Web- 
based systems. This article provides a survey of the work in 
recommendation system, focusing on its application and 
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future. The survey aims to serve as a source of ideas for 
people working on the recommendation of information 
systems, particularly those systems that are accessible over 
the Web. Since the current web is largely unorganized and 
there is a rapid growth of information volumes, the 
recommendation system whose major purpose is to reduce 
irrelevant content and to provide users with more pertinent 
and tailored information becomes an important research 
area. A key issue in this area is how to discover user's 
interest and behavior effectively. 

The selection of the Apriori algorithm for performing Web 
recommendation system is because, Apriori algorithm is a 
common recommendation technique for association based 
analysis. By applying this algorithm to the user systems, the 
relationship between the accessed pages and visitors can be 
efficiently maintained. The Web usage patterns and user 
behavior also can analyze by using this algorithm where the 
descriptive statistic approach cannot perform this analysis. 
The results and findings for this analysis are more reliable 
but less of accuracy because of the Apriori algorithm 
properties where the same selected item sets are always 
counted. The results or findings from this experimental 
analysis are surely useful for Web administrator in order to 
improve Web services and performance through the 
improvement of Web sites, including their contents, 
structure, presentation, and delivery. 

We hope that the framework and survey presented in this 
paper will lead to research that is more systematic on 
recommendation system using apriori algorithm. 
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Fast Association Rule Mining Algorithm for 



Spatial Gene Expression Data 
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Abstract- One of the important problems in data mining is 
discovering association rules from spatial gene expression data 
where each transaction consists of a set of genes and probe 
patterns. The most time consuming operation in this 
association rule discovery process is the computation of the 
frequency of the occurrences of interesting subset of genes 
(called candidates) in the database of spatial gene expression 
data. A fast algorithm has been proposed for generating 
frequent itemsets without generating candidate itemsets along 
with strong association rules. The proposed algorithm uses 
Boolean vector with relational AND operation to discover 
frequent itemsets. Experimental results shows that combining 
Boolean Vector and relational AND operation results in 
quickly discovering of frequent itemsets and association rules 
as compared to general Apriori algorithm . 

Keywords - Spatial Gene expression data, Association Rule, 
Frequent itemsets, Boolean vector, relational AND 
operation, Similarity Matrix. 

I. Introduction 

T here has been a great explosion of genomic data in 
recent years. This is due to the advances in various 
high-throughput biotechnologies such as spatial gene 
expression database. These large genomic data sets are 
information-rich and often contain much more information 
than the researchers who generated the data might have 
anticipated. Such an enormous data volume enables new 
types of analyses, but also makes it difficult to answer 
research questions using traditional methods. Analysis of 
these massive genomic data has two important goals: 

I) To try to determine how the expression of any 
particular gene might affect the expression of other 
genes 

II) To try to determine what genes are expressed as a 
result of certain cellular conditions, e.g. what 
genes are expressed in diseased cells that are not 
expressed in healthy cells? 

The most popular pattern discovery method in data mining 
is association rule mining. Association rule mining was 
introduced by [4] . It aims to extract interesting correlations, 
frequent patterns, associations, or casual structures among 
sets of items in transaction databases or other data 
repositories. The relationships are not based on inherent 
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properties of the data themselves but rather based on the co- 
occurrence of the items within the database. 
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The associations between items are commonly expressed in 
the form of association rules. In this setting, attributes which 
represents items are assumed to have only two attributes and 
thus referred as Boolean attributes. If an item is contained in 
a transaction, the corresponding attribute value will be 1; 
otherwise the value will be 0. Many interesting and efficient 
algorithms have been proposed for mining association rules 
for these Boolean attributes, for examples, Apriori [3], DHP 
[6], and partition algorithms [7]. Currently most association 
mining algorithms are dedicated to frequent itemsets 
mining. These algorithms are defined in such a way that 
they only find rules with high support and high confidence. 
A characteristic of frequent itemsets mining is that it relies 
on there being a meaningful minimum support level that is 
sufficiently high to reduce the number of frequent itemsets 
to a manageable level. A huge calculation and a complicated 
transaction process are required during the frequent itemsets 
generation procedure. Therefore, the mining efficiency of 
the Apriori-like algorithms is very unsatisfactory when 
transaction database is very large particularly spatial gene 
expression database. 

In this paper, an attempt has been made to propose a novel 
algorithm for mining association rule from spatial gene 
expression data. 

II. Materials and methods 
A. Spatial Gene Expression Data 

The Edinburgh Mouse Atlas gene expression database 
(EMAGE) is being developed as part of the Mouse Gene 
Expression Information Resource (MGEIR) [1] in 
collaboration with the Jackson Laboratory, USA. EMAGE 
(http://genex.hgu. mrc.ac.uk/Emage/database) is a freely 
available, curated database of gene expression patterns 
generated by in situ techniques in the developing mouse 
embryo. The spatial gene expression data are presented as 
NxN similarity matrix. Each element in the matrix is a 
measure of similarity between the corresponding probe 
pattern and gene-expression region. The similarity is 
calculated as a fraction of overlap between the two and the 
total of both areas of the images. This measurement is 
intuitive, and commonly referred to as the Jaccard index [2]. 
When a pattern is compared to itself, the Jaccard value is 1 
because the two input spatial regions are identical. When it 
is compared to another pattern, the Jaccard Index will be 
less than one. If the Jaccard Index is 0, the two patterns do 
not intersect. If a Jaccard Index value is close to 1, then the 
two patterns are more similar. 

However, biologists are more interested in how gene 
expression changes under different probe patterns. Thus, 
these similarity values are discretized such that similarity 
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measure greater than some predetermined thresholds and 
converted into Boolean matrix. 

B. Data Preprocessing 

Preprocessing is often required before applying any data 
mining algorithms to improve performance of the results. 
The preprocessing procedures are used to scale the data 
value either 0 or 1. The values contained in the spatial gene 
expression matrix had to be transformed into Boolean values 
by a so-called discretization phase. In our context, each 
quantitative value has given rise to the effect of four 
different discretization procedures [2]: Max minus x% 
method, Mid-range-based cutoff method, x% cut off and x% 
of highest value method. 

Max minus x% procedure consists of identifying the highest 
expression value (HV) in the data matrix, and defining a 
value of 1 for the expression of the gene in the given data 
when the expression value was above HV - x% of HV 
where x is an integer value. Otherwise, the expression of the 
gene was assigned a value of 0 (Figure la). 

Mid-range-based cutoff (Figure lb) identifies the highest 
and lowest expression values in the data matrix and the mid- 
range value is defined as being equidistant from these two 
numbers (their arithmetic mean). Then, all expression values 
below or equal to the mid-range were set to 0, and all values 
strictly above the mid-range were set to 1. 
x% of highest value approach (Figure lc) identifies data in 
which its level of expression is in the 5% of highest values. 
These are assigned the value 1, and the rest were set to 0. 
Value greater than x% approach (Figure Id) identifies the 
level of expression and assigns the value 1 when it is greater 
than given percentage and the rest are set to 0. 

From these four different procedures resulted in different 
matrix densities, the first and last procedure resulted in the 
same number of Boolean 1 results for all gene expressions, 
whereas the second and fourth procedure generated same 
densities of 1, depending on the gene expression pattern 
throughout the various data matrix. From the similarity 
matrix, two different sets of transactions are constructed, 
which in turn lead to two different types of association rules. 

I) The items I are genes from the data set, where a 
transaction T C I consists of genes that all have an 
expression pattern intersecting with the same probe 
pattern. 

II) The items I are the probe patterns, where a 
transaction T ^ I consists of probe patterns all 
intersecting with the expression patterns in the 
same image. 

To create the first type of transactions, we take for each 
probe pattern r, every gene g from which its associated gene 
expression pattern ge satisfies the minimum similarity P, 
i.e., similarity(r, ge) > P, to form the itemsets. 

The second type of transactions is created in a similar way. 
For each gene expression pattern g in the database we create 
an itemsets that consists of a set of probe patterns that 
intersect with the gene expression pattern ge. Each probe 
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pattern r must satisfy the minimum similarity P, i.e.., 
similarity(r, ge) > P, to get included in the itemsets. 
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Fig. la Results of Max minus 25% method 
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Fig. lb. Results of Mid-range-based cutoff 
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Fig.lc. Results of x% of highest value approach 
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Fig.lc. Results of x% of highest value approach. 

Fig.l. Schematic description of the discretization protocols 
used. 



C. Association Rule Mining 

The Apriori-like algorithms adopt an iterative method to 
discover frequent itemsets. The process of discovering 
frequent itemsets need multiple passes over the data. The 
algorithm starts from frequent 1 -itemsets until all maximum 
frequent itemsets are discovered. The Apriori-like 
algorithms consist of two major procedures: the join 
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procedure and the prune procedure. The join procedure 
combines two frequent k-itemsets, which have the same (k- 

l)-prefix, to generate a (k+l)-itemset as a new preliminary 
candidate. Following the join procedure, the prune 
procedure is used to remove from the preliminary candidate 
set all itemsets whose k-subset is not a frequent itemsets [3]. 
From every frequent itemset of k>=2, two subsets A and C, 
are constructed in such a way that one subset C, contains 
exactly one item in it and remaining k- 1 items will go to the 
other subset A. By the downward closure properties of the 
frequent itemsets these two subsets are also frequent and 
their support is already calculated. Now these two subsets 
may generate a rule A — >C, if the confidence of the rule is 
greater than or equal to the specified minimum confidence. 

D. Algorithm Details 

I) Let I={il, i2, ..., in} be a set of items, where each 
item ij corresponds to a value of an attribute and is 
a member of some attribute domain Dh={dl, d2, 
..., ds}, i.e. ij C Dh. If I is a binary attribute, then 
the Dom (I)={0,1}. A transaction database is a 
database containing transactions in the form of (d, 
E), where d E Dom(D) and E I. 

II) Let D be a transaction database, n be the number of 
transactions in D, and minsup be the minimum 
support of D. The new_support is defined as 
new_support = minsup x n. 

III) Proposition 1: By Boolean vector with AND 
operation, if the sum of „1" in a row vector Bi is 
smaller than k, it is not necessary for Bi to involve 
in the calculation of the k- supports. 

IV) Proposition 2: According to [5], Suppose Itemsets 
X is a k-itemsets; IFk-i(j)I presents the number of 
items „j" in the frequent set Fk-i. There is an item j 
in X. If I Fk-i(j)I is smaller than k-1, itemset X is 
not a frequent itemsets. 

V) Proposition 3: IFkI presents the number of k- 
itemsets in the frequent set Fk. If IFkI is smaller 
than k+1, the maximum length frequent itemsets is 
k. 

The proposed algorithm for finding the association rules in 
terms of spatial gene expression data in the form of 
similarity matrix consists of five phases as follows. 

1. Transforming the similarity matrix into the 
Boolean matrix 

2. Generating the set of frequent 1 -itemsets FI 

3. Pruning the Boolean matrix 

4. Generating the set of frequent k-itemsets Fk(k>l) 

5. Generating association rules from the generated 
frequent itemsets with confidence value greater 
than a predefined threshold (minconfidence). 

A detailed description of the proposed algorithm is 
described as follows: 

Input: Spatial Gene Expression data in similarity matrix 
(M), the minimum support, and minimum confidence. 
Output: Set of frequent itemsets F and Association rules. 

1 . Normalize the data matrix M and transformed into 
Boolean 



Matrix B; 

// Frequent 1 -itemset generation 

2. For each column Ci of B 

3. If sum(Ci) >= new_support 

4. FI = { Ii}; 

5. Else delete Ci from B; 

// By Proposition 1 

6. For each row Rj of B 

7. If sum(Rj) < 2 

8. Delete Rj from B; 

// By Proposition 2 and 3 

9. For (k=2; I Fk- 1 1 > k- 1 ; k++) 

10. { 

// Join procedure 

11. Produce k- vectors combination for all columns of 
B; 

12. For each k- vectors combination { Bil, Bi2,. . .Bik} 

13. { E= Bil fl Bi2 D....nBik 

14. If sum(E) >= new_support 

15. Fk = { Iil, Ii2,...Iik} 

16. } 

// Prune procedure 

17. For each item Ii in Fk 

18. If IFk(Ii)l < k 

19. Delete the column Bi according to item Ii from B; 

20. For each row Rj of B 

21. If sum(Bj) < k+1 

22. Delete Bj from B; 

23. k=k+l 

24. } 

25. Return F = F1UF2. . ..UFk 

26. For all Fk k >= 2 do 

27. For all i <= k do 

28. c=Fk[i] 

29. a = Fk - c 

30. if((new_support(Fk)/ new_support(a) >= 
minconfidence 

3 1 . declare a — ► c is a rule 

32. enddo 

33. enddo 

IIL Results and discussion 

The proposed algorithm was implemented in Java and tested 
on Linux platform. Comprehensive experiments on spatial 
gene expression data has been conducted to study the impact 
of normalization and to compare the effect of proposed 
algorithm with Apriori algorithm. Figure 2 and 3 gives the 
experimental results for execution time (generating frequent 
itemsets and finding rules) vs. user specified minimum 
supports and shows that response time of the proposed 
algorithm is much better than that of the Apriori algorithm. 
In this case, confidence value is set 100% for the rule 
generation, which means that all the rules generated are true 
in 100% of the cases. 
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Fig. 2. Performance on Stage 14 of EM AGE Spatial Gene 
expression data (Minsupport vs. Execution time) 




Fig. 3. Performance on Stage 17 o EM AGE Spatial Gene 
expression data (Minsupport vs. Execution time) 

Figure 4 and 5 gives the experimental results for memory 
usage vs. user specified minimum supports and results show 
that proposed algorithm uses less memory than that of 
Apriori algorithm because of the Boolean and relational 
AND bit operations. 




Fig. 4. Performance on Stage 14 of EM AGE Spatial Gene 
expression data (Minsupport vs. Memory usage) 




Fig. 5. Performance on Stage 17 of EM AGE Spatial Gene 
expression data (Minsupport vs. Memory usage) 



Fig. 6. Association rules and Minimum support in Apriori 
algorithm 




Fig. 7. Association rules and Minimum suppport in Proposed 
algorithm 

The number of association rules decreases along with an 
increase in minimum support (or minimum confidence) 
under a given specific minimum confidence, which shows 
an appropriate Minsupport (or Minconf) can constraint the 
number of association rules and avoid the occurrence of 
some association rules so that it cannot yield a decision. 
These results have shown in Figures 6-7. The results are as 
expected and quite consistent with our intuition. 

IV. Conclusion 

In this paper, a novel method of mining frequent itemsets 
and strong association rules from the spatial gene expression 
data is proposed to generate frequently occur genes very 
quickly. The proposed algorithm does not produce candidate 
itemsets, it spends less time for calculating k-supports of the 
itemsets with the Boolean matrix pruned, and it scans the 
database only once and needs less memory space when 
compared with Apriori algorithm. Finally, the large and 
rapidly increasing compendium of data demands data 
mining approaches, particularly association rule mining 
ensures that genomic data mining will continue to be a 
necessary and highly productive field for the foreseeable 
future. 
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Abstract- The traffic load of wireless LANs is often unevenly 
distributed among the access points (APs), which results in 
unfair bandwidth allocation among Mobile Users. We argue 
that the load imbalance and consequent unfair bandwidth 
allocation can be greatly reduced by intelligent association 
control. In this paper, we present an efficient solution to 
determine the user-AP associations for bandwidth allocation. 
We show the strong correlation between fairness and load 
balancing, which enables us to use load-balancing techniques 
for obtaining optimal fair bandwidth allocation. As this 
problem is NP-hard, we devise algorithms that achieve 
constant factor approximation. In our algorithms, we first 
compute a distributed association solution, in which users can 
be associated with multiple APs simultaneously with variable 
bandwidth. This solution guarantees the fairest bandwidth 
allocation in terms of Max-min fairness; we obtain the integral 
solution from the fractional solution by distributed association 
algorithm. We also consider time fairness and present a 
polynomial-time algorithm for optimal integral solution and it 
is ensure that zero percent data loss. 

Keywords- Distributed Association algorithms, IEEE 802.11 
WLANs, load balancing. 

I. Introduction 

R ecent studies on operational Wireless LANS (WLANs) 
have shown that the traffic load is often distributed 
unevenly among the access points (APs) for Mobile Users 
(MU). In WLANs, by default, each user scans all available 
channels to detect its nearby APs and associate itself with 
the AP that has the strongest received signal strength 
indicator (RSSI), while ignoring its load condition. As users 
are, typically, not uniformly distributed, some APs tend to 
suffer from heavy load while adjacent APs may carry only 
light load or be idle. Such load imbalance among APs is 
undesirable as it hampers the network from providing fair 
services to its users. As suggested in existing studies the 
load imbalance problem can be alleviated by balancing the 
load among the APs via intelligently selecting the user-AP 
association, termed association control. Association control 
can be used to achieve different objectives. For instance, it 
can be used to maximize the overall system throughput by 
shifting users to idle or lightly loaded APs and allowing 
each AP to serve only the users with maximal data rate. 
Clearly, this objective is not a desired system behavior from 
the fairness viewpoint. A more desirable goal is to provide 
network-wide fair bandwidth allocation, while maximizing 
the minimal fair share of each user. This type of fairness is 
known as maxmin fairness. Informally, a bandwidth 
allocation is max-min fair if there is no way to give more 



bandwidth to any user without decreasing the allocation of a 
user with less or equal bandwidth. In this paper, we present 
efficient user-AP association control algorithms that ensure 
maxmin fair bandwidth allocation and we show that this 
goal can be obtained by balancing the load on the APs. 

ii. Review of literature 

Load balancing in WLANs has been intensely studied. In 
[1], association algorithm has been proposed for efficient 
bandwidth allocation with constant bandwidth. [3]- [4] on 
operational Wireless LANS (WLANs) have shown that the 
traffic load is often distributed unevenly among the access 
points (APs). In WLANs, by default, each user scans all 
available channels to detect its nearby APs and associate 
itself with the AP that has the strongest received signal 
strength indicator (RSSI), while ignoring its load condition. 
As users are, typically, not uniformly distributed, some APs 
tend to suffer from heavy load while adjacent APs may 
carry only light load or be idle. Such load imbalance among 
APs is undesirable as it hampers the network from providing 
fair services to its users. As suggested in existing studies 
[6] -[7] the load imbalance problem can be alleviated by 
balancing the load among the APs via intelligently selecting 
the user- AP association, termed association control. 
Association control can be used to achieve different 
objectives. In [7] -[9], different association criteria are 
proposed. These metrics typically take into account factors 
such as the number of users currently associated with an AP, 
the mean RSSI, the RSSI of the new user and the bandwidth 
a new user can get if it is associated with an AP in [8]. 
Various WLAN vendors have incorporated proprietary 
features in the device driver’s firmware [10], [11]. In these 
proprietary solutions, the APs broadcast their load 
conditions to the users via the Beacon messages and each 
user chooses the least loaded AP. Propose to associate new 
users with the AP that can provide a minimal bandwidth 
required by the user. If there is more than one such AP, the 
one with the strongest signal is selected. Most of these 
heuristics only determine the association of newly arrived 
users. Tsai and Lien [8] propose to reassociate users when 
some conditions are violated. Load balancing in cellular 
networks is usually achieved via dynamic channel allocation 
(DCA) [12]. 

hi. Wireless and wired bottlenecks 

However, the wireless link is generally considered as the 
bottleneck. This assumption is not always valid. For 
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instance, consider a WLAN where the APs are connected to 
the infrastructure 




AP-infrastructure link 

(a) 




<b) 



Fig. 1. Examples of bottlenecks both over the wireless and 
the wired links, (a) An unfair association, (b) The optimal 
association. 

T1 lines, whose capacity is around 1.5 Mb/s, as illustrated in 
Example 2. Example 2 demonstrates the need to consider 
both the wireless and the wired links for load balancing. 
Example 2: Consider a wireless system with 2 APs, a and b , 
and 6 users, enumerated from 1 to 6, as depicted in Fig. 1. 
Users 1,2,3 and 4 experience a bit rate of 2 Mb/s from both 
APs, a while users 5 and 6 have a bit rate of 1 Mb/s from 
both APs. The B APs are connected to a fixed network with 
T1 lines with capacity 0 of 1.5 Mb/s. In the following, we 
consider two possible associations and we analyze the 
average bandwidth that they provide to the users. 

Case I: A fair user association only from the wireless 
perspective- Consider the association depicted in Fig. 1(a). 
Here, the system can allocate a bandwidth of 0.5 Mb/s to 
each user over the wireless links. However, while AP a can 
allocate a bandwidth of 0.5 Mb/s to users 5 and 6 on its T1 
line, AP b can only provide 3/8 Mb/s to its associated users 
over its line. In this case, the wireless link of AP is the 
bottleneck that affects the bandwidth allocation. Meanwhile, 
the wired link is the bottleneck of AP. 

Case II: A fair user association- Consider the association 
shown in Fig. 1(b). This association provides a bandwidth of 

0.5 Mb/s to each user over the wired and wireless channels. 
Observe that in this case different users may gain different 
service time on the wireless links and wired backhauls. For 
instance, user 5 captures 1/3 of the service time of the T1 
link of AP, while, it is served 1/2 of the time by its wireless 
channel. This ensures that user 5, indeed, receives a 
bandwidth of 0.5 Mb/s. 



iv. Fairness and load balancing 

In this section, we provide formal definitions of fair 
bandwidth allocation and load balancing. Additionally, they 
describe some useful properties that we need for 
constructing our algorithmic tools. In the following, we 
consider two association models from this. The first is a 
single-association model, so-called an integral- association, 
where each user is associated with a single AP at any given 
time. This is the association mode used in IEEE 802.11 
networks. The second is a multiple-association model, also 
termed a fractional-association that allows each user to be 
associated with several APs and to get communication 
services from them simultaneously. Accordingly, a user may 



receive several different traffic flows from different APs, 
and its bandwidth allocation is the aggregated bandwidth of 
all of them. This model is used to develop our algorithmic 
tools for the integral- association case. For both association 
models, we denote by Ua all the users that are associated 
with AP a GA and denotes the set of APs that user u G U is 
associated with. 

v. Distributed association algorithm 

In this section, after exploring the details of distributed AP 
selection algorithm for APs and MUs, we also analyze the 
stability and overhead of the proposed algorithm. 

A. Association Algorithm for APs and MUs 

By exchanging information among MUs and APs, the 
proposed association scheme can be summarized as Algo. 1 
as shown in Fig. 2. In legacy IEEE 802.11 standard, the 
management packets from the AP do not contain any field 
indicating the AP load information. To realize the proposed 
scheme, it is required to add one additional field to the 
beacon and probing packets. Moreover, due to the dynamic 
nature of the wireless network and the mobility of MUs, the 
APs should keep updating the AP load by iterative moving 
average as 

y t (t+T n ) = ay a (t) + (l-a) £</„(/) 

wU s i 0 

where T£2 is the fixed updating interval and 0<a<l is the 
weighting parameter to tradeoff previously estimated AP 
load and current value. If a MU is not associated with any 
AP in the network, it immediately scans all channels by 
sending probe request messages and receives response 
packets from the available APs. By detecting the respective 
RSSI levels to the APs, each MU can determine the most 
suitable physical data rate for transmitting packets. The 
proposed AP selection strategy is to let each MU choose the 
AP with least estimated load by supposing that it will be 
associated with all available APs. That is, if the newly 
joining MU u can be served by a subset of APs Au E A, the 
estimated AP load on a G Au supposing the association of 
MU u with AP a will be updated as 

Algorithm 1 Association algorithm for each AP and MU. 

Periodical operation oil each AP a with interval 

1. Periodically update its AP load by Eq. (2). 

Periodical operation on each MU // with interval 

1. Exchange the probing packets with AP. 

2. Calculate the estimated AP load by Eq. (3). 

3. if u is a newly MU joining the WLAN then 

4. The MU u selecrs the AP as orgmina^ 

5. else Pu is already associated with AP a */ 

6. if switching to a! lead to - y a > (t) > S then 

7. MU // switches the association to a f . 

8. end if 

9. end if 

Fig. 2. The distributed algorithm for load balancing in 
WLANs. 
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Then the MU will select an AP as argminaEAuya(t). After 
the MU joins the WLAN, it will keep periodically (with 
period TA) detecting the load information from the 
neighboring APs and change its association if the AP loads 
can be further decreased. This operation is not only 
necessary to reduce the effect introduced by the joining 
order of MUs but also required for the MU to be adaptive to 
the dynamic wireless environment and topology changes. 
The period TA, configured to be more than 10 seconds, is 
much longer than the load-updating period TO on the AP. 

B. Association Algorithm for APs and MUs 

In dynamic WLANs, the association of MUs should vary 
with the network conditions. However, it is not intuitively 
obvious that the proposed distributed algorithm is self- 
stabilizing for static networks. That is, MUs continually 
looking to balance the AP loads will eventually converge to 
a stable result in static topology. Here we can show that 
indeed this process does stabilize. 

Theorem 1: For a fixed population WLAN with APs and 
static MUs that implement the above distributed association 
algorithm with 5 = 0, the switching operations of the MUs in 
Algo. 1 reaches a stable state where MUs cease changing 
associated APs2. 

Proof: The core part of the proof is that a monotonic 
property of global lexicographic ordering [15] decrement 
holds whenever one MU switches its association. 
Lexicographic order, a concept borrowed from economics, 
can be used to compare the extent of fairness between two 
vectors. Given two vectors A and B; the method to 
determine the lexicographic order is to compare the 
corresponding values index by index after sorting the 
original vectors. According to Algorithm 1, assuming one 
MU switch from AP a to AP b, the AP loads of them are 
denoted as ya, yb, y’ a, and y’ b , respectively. 
Straightforwardly, we will have yb < ya, y’ b < ya, and y’ a 
< ya, where the lexicographic order has been decreased. 
Since the lexicographical order cannot be infinitely 
decreased, we can conclude that the Algo. 1 will stop after 
finite number of operations. 

The introduced overhead by the proposed algorithm on the 
AP is straightforwardly low. On each MU, the most time 
consuming operation is the periodically probing process in 
every TA seconds. However, this probing process only takes 
around 300ms according to measurements. Comparing with 
the interval TA, the overhead is almost negligible. 

VI. Performance evaluation 

In this section, we first introduce the numerical evaluation 
based on the developed simulation program. The program is 
able to simulate dynamic and large-scale topology to clearly 
show the achievable benefits of the proposed scheme. We 
then provide NS2 [16] simulation results for a medium-size 



topology with suddenly roaming clients. Finally, we also 
explain our prototype implementation on a testbed built with 
normal computers. To measure the performance, we use 
total throughput E u C U 0u as the metric to measure the 
overall efficiency and Jain’s fairness index [17] to denote 
the degree of load balancing in the network. 

2 5 = 0 is the loosest condition to activate the switching 
operation. 




Fig. 3. The snapshot of developed numerical simulator. 




Fig. 4. A realistic scenario with measured mobility for 
numerical simulation. The red squares denote the APs and 
the blue circles denote the MUs at the beginning of 
simulation. 

VII. Numerical simulation for realistic scenario 

In order to evaluate the proposed scheme for large-scale 
topologies, we have developed a discrete-event simulator 
based on SimPy [18], which is a Python framework for 
discrete-event simulation applications. Users can manually 
place the APs and MUs in the GUI (Graphic User Interface). 
The generated scenario can also be saved and loaded for 
future use. The snapshot of the program interface is captured 
and shown in Fig. 3. To accelerate the simulation, the 
complex behavior of IEEE 802. 1 1 MAC is simplified and 
the throughput is calculated by the throughput model given 
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in [12]. We use a set of measured trace files provide by [19], 
which collected the 20 minutes measurement data by 
capturing the realistic mobility patterns of the MUs in the 
campus of Dartmouth University. From the measurement 
results, we pick up 56 APs and 126 MUs with their mobility 
placed in a rectangle topology of size 1100xl000m2 as 
shown in Fig. 4. 




Fig. 5. The throughput difference between RSSI-based 
scheme and proposed scheme w.r.t simulation time for the 
realistic topology shown in Fig. 5. 




Tim* (second) 

Fig. 6. The Jain’s fairness value difference between RSSI- 
based scheme and proposed scheme w.r.t simulation time 
for the realistic topology shown in Fig. 5. 

According to Fig. 5 and Fig. 6, we can observe that the total 
throughput achieved by the proposed scheme is generally 
the same or sometimes higher than that of the default RSSI 
based scheme. However, the value of fairness metric has 
been apparently (between 20%-30%) improved after 
applying the proposed scheme. On the other hand, we also 
find that it mostly takes only one probing and reassociation 
operation for the MUs to reach a steady state when they 
move around in the topology. 

VIII. Conclusion 

In this paper, we have explored the load balancing scheme 
to guarantee the throughput fairness among the MUs. To 
achieve this, we have proposed a distributed and self- 
stabilized association scheme for the MUs in the multi -rate 
WLANs. The proposed scheme gradually balances the AP 
loads in a distributed manner. With extensive simulations, 



we can observe that it can significantly improve, or 
sometimes nearly double, the extent of throughput fairness 
among the MUs with low overhead. To show the feasibility 
of the proposed scheme, we have implemented a prototype 
on normal computers by modifying open source wireless 
driver software packaged Our research is oriented for 
practical WiFi products and can be implemented with small 
additional modification to achieve apparent load balancing 
in deployed WLANs. 
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Abstract- With the Internet evolved into a global 
commercial infrastructure, there has been a great demand for 
new applications of global reach, for which today’s Internet 
protocols cannot adequately support. The real-time 
applications have stringent delay and delay jitter requirements, 
which cannot be adequately supported by today’s Internet 
protocols. 

As a result, in recent years, a large number of new Internet 
protocols were developed in an attempt to meet this demand. 
Multi-Protocol Label Switching (MPLS) has been envisioned as 
an ideal platform upon which guaranteed services could be 
developed. Service guarantee is achieved by setting up and 
managing a set of primary and Backup class-of-service (CoS) 
aware label switched paths across an IP domain. In addition to 
MPLS, this approach requires a suite of protocols be 
implemented, e.g., DiffServ for quality of service (QoS), path 
protection / fast rerouting for link failure recovery (FR), and 
constraint-based routing for traffic engineering (TE). 

The proposed thesis develop a family of distributed traffic 
control laws (DCLs), which allows optimal, multiple CoSs, 
multipath based rate adaptation and load balancing. The DCLs 
drive the network to an operation point where a user defined 
global utility function is maximized. The proposed family of 
DCLs has, the capability to enable optimal, scalable QoS, and 
Traffic Engineering, simultaneously. 

I. Introduction 

T he transport control protocol (TCP) window-based 
congestion control algorithms use minimum 
information from the network as input to allow fully 
distributed traffic control. In other words, the only needed 
feedback information for the TCP window-based congestion 
control is whether the forwarding path is congested or not. 
This allows the TCP source node to infer path congestion by 
counting the number of repetitive acknowledgments of the 
same packet or measuring end-to-end round-trip delay, 
making TCP a truly end-to-end protocol without the 
assistance of the underlying internetworking layer 
infrastructure. This has made the proliferation of the Internet 
applications at global scale possible. 

An excellent example is the fast, ubiquitous adoption of 
World Wide Web due to its use of TCP as its underlying 
transport. However, as the Internet has evolved into a global 



commercial infrastructure, there has been a great demand for 
new applications of global reach, for which today’s Internet 
protocols cannot adequately support. 

For example, real-time applications, such as voice over IP 
(VoIP) and videophone, have stringent delay and delay jitter 
requirements, which cannot be adequately supported by 
today’s Internet protocols. As a result, in recent years, a 
large number of new Internet protocols were developed in 
an attempt to meet this demand. 

For example, multiprotocol label switching (MPLS) has 
been envisioned as an ideal platform upon which guaranteed 
services could be developed. Service guarantee is achieved 
by setting up and managing a set of primary and backup 
class-of-service (CoS) aware label switched paths across an 
IP domain. 

In addition to MPLS, this approach requires a suite of 
protocols be implemented, e.g., DiffServ for quality of 
service (QoS), path protection/fast rerouting for link failure 
recovery (FR), and constraint-based routing for traffic 
engineering (TE). This, however, means that, to adequately 
support real-time applications, a whole suite of protocols 
with significant involvement of the IP core nodes need to be 
developed. 

This raises serious concerns about the scalability and 
complexity of using these protocols to support real-time 
applications at a global scale. 

Hence, a key question to be answered is whether it is 
possible to enable the above service, quality features, 
including QoS, 

ii. Literature review 

The existing algorithms focus on TCP types of traffic 
including both empirical algorithms and algorithms based on 
control theory [10]. These algorithms assume a single path, 
and the approaches taken are not optimization based. 

In the existing scheme flows with different ingress-egress, 
node pairs share the same network resources. Degree of 
interaction between different flows due to the resource 
constraints was very poor in the existing Distributed traffic 
control laws [1]. 




Global Journal of Computer Science and Technology 



Page | 47 Vol. 9 Issue 5 (Ver 2.0), January 2010 



Since flows with different ingress-egress node pairs share 
the same network resources, the key challenge in the design 
of DCLs is the fact that there is a high degree of interaction 
between different flows due to the resource constraints. One 
existing approach to get around this is to incorporate a link 
congestion cost into the overall utility function, which 
replaces the link resource constraints. Then, the problem is 
solved using a gradient type algorithm, resulting in families 
of DCLs that support point-to-point multipath load 
balancing for rate adaptive traffic [6, 7]. 

Some of the existing methods developed a family of DCLs 
based on nonlinear control theory. This family of DCLs can 
be applied not only to usual rate adaptive traffic with point- 
to-point multipath, but also to rate adaptive traffic with 
minimum service requirements and/or maximum allowed 
sending rate and to services with targeted rate guarantee, all 
allowing for point-to-point multipath. 

The only needed feedback from the network is the number 
of congested links along the forwarding paths [2, 5]. 
Moreover, the technique applies to any utility function that 
can be expressed as a sum of concave terms. 

Due to the needed use of the number of congested links in a 
forwarding path as the input to a DCL, the existing family of 
DCLs requires explicit congestion feedback from the 
network. The existing scheme can only be applied to a 
connection-oriented network, such as an MPLS enabled IP 
network [9]. 

In the proposed system, the DCLs control the traffic 
independently at different traffic source nodes, e.g., edge 
nodes or end-hosts. A salient feature of this family of DCLs 
is that the needed information feedback from the network is 
minimum, i.e., whether a forwarding path is congested or 
not, which can be inferred at the source node itself, the same 
way as TCP congestion notification. This makes it possible 
to allow this family of DCLs to be operated end-to-end. 

hi. System model 

The traffic flows can be described by a fluid flow model, 
where the only resource taken into account is link 
bandwidth. For simplicity, first restrict ourselves to the 
point-to-point multipath only and address the point-to- 
multipoint and multicast cases later. 

The system model, consider a computer network where calls 
of different types are present. Types denote an aggregate of 
calls with the same ingress and egress node, as well as 
service requirements i.e., calls that share a given set of paths 
connecting the same ingress-egress node pair and whose 
service requirements are to be satisfied by the aggregate, not 
by individual calls. When the edge nodes coincide with the 
end-hosts, the control laws developed in this paper become 
end-to-end control laws working at the transport layer 
servicing individual application flows. 

A. Discretization, Delays and Quantization 

The issues handled in implementing the control laws 
implement a discrete time version of the control algorithms, 
uses finite word length which leads to a quantization of the 



possible data rate values and there is delay in the 
propagation of the congestion information. All of these lead 
to a well-known phenomenon called oscillation. Even in this 
case, the discretization of the control laws is approximately 
optimal. 

B. Congestion Detection and Notification 

To maintain the transport or higher layers abstraction, a 
source inferred congestion detection and notification 
mechanism is desirable for the implementation of this 
family of DCLs in a connectionless IP network. However, 
unless the transport or higher layer protocol that implements 
this family of DCLs is defined, the exact source inferred 
congestion detection and notification mechanism cannot be 
decided. 

For example, if a DCL in this family is used in association 
with a TCP-like reliable transport protocol, a source inferred 
congestion detection and notification mechanism based on, 
for example, ACK counts can then be adopted. On the other 
hand, if the DCL is used in association with an UDP-like 
unreliable transport protocol, the forwarding path congestion 
may be detected and notified by periodically sending an 
echo packet to the destination node and measuring the 
round-trip time of the echoed packet. 

The source inferred congestion detection and notification 
approaches can also be used in the context of a connection- 
oriented network, such as an MPLS one. In addition, other 
mechanisms can be employed, e.g., mechanisms using a 
signaling protocol for congestion detection and notification. 

C. Failure Detection and Notification 

The node/link failure detection and notification may or may 
not be integrated with the congestion detection and 
notification mechanism. Again, they are dependent on the 
actual protocol that implements a DCL in this family. For 
example, a source inferred congestion detection and 
notification using echo packets to infer path congestion may 
also be used to infer possible node/link failures. On the other 
hand, in an MPLS network, the path protection mechanism 
under development can be leveraged to allow failure 
detection and notification, separate from the congestion 
detection and notification mechanisms. 

D. Design Parameters 

The behavior of the algorithm under different choices of the 
Design parameters are 

i. Oscillation Reduction Functions 

The adaptive oscillation reduction has a big impact on 
performance. Considering the behavior for a constant, the 
maximum value allowed for the time variation. The 
observed oscillation is clearly larger in magnitude. 
Moreover, due to the larger oscillations, convergence to a 
larger neighborhood of the optimal is obtained, and 
departures from the average target rates for AF are also 
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larger (providing a worse service to these users). On the 
other hand, the transient response is faster due to larger data 
rate derivatives. 

ii. Discretization Step 

Another parameter that has a bearing in the performance of 
the algorithm is the discretization step. In order to show its 
influence, it was chosen as 10 ms. clearly, oscillations are 
also larger in this case. However, the response is still 
acceptable, and a smaller could be used to limit the 
magnitude of the spikes. 

Hi. Scaling of the Utility Function 

The scaling of the utility function does not alter the solution 
of the optimization problem at hand. It does, however, 
change the bounds on the quantities. Due to the exponential 
dependence on the gradient, it is advisable to choose a small 
value of such that the resulting value of is in the order of 1 . 
Simulations have shown that the algorithm is very sensitive 
to with the amplitude of the oscillations increasing 
substantially when one increases this parameter. 

However, convergence to a neighborhood of the optimal is 
still achieved as one can expect. In addition, the AF 
constraints are satisfied in the average but large departures 
from the imposed average rate can happen for high values. 

iv. Experimental evaluation 

The new family of DCLs provides the much-needed 
mathematical foundation that allows the use of source 
inferred congestion detection and notification to maintain 
layer abstraction. The new family of DCLs allows the rate 
control to be decoupled from the congestion detection 
mechanisms in use. This means that any queue management 
algorithm and queue scheduling discipline used in the core 
nodes can coexist with the family of DCLs running at the 
edge nodes or end-hosts. 

The implementation of any DCL in this family, only needs 
to consider the two end nodes, provided that a source 
inferred congestion detection and notification is available. 
However, having said that, different queue management 
algorithms and queue scheduling disciplines do have an 
impact on the overall performance for any end-to-end traffic 
control mechanism. 

As a result, there are two key components in the 
implementation of the family of DCLs, i.e., the 
implementation of the DCL in the edge nodes or end-hosts 
and the design of source inferred congestion detection and 
notification mechanisms. The system model focus on the 
issues related to the design of source-inferred congestion 
detection and notification mechanisms. 

v. Conclusion 

The proposed family of DCLs can be applied to a 
connectionless IP network to enable sophisticated service 
quality features, solely based on a set of shortest paths from 



any given ingress node to a set of egress nodes. The 
distributed traffic control laws (DCLs) allows optimal, 
multiple CoSs, multi-path based rate adaptation and load 
balancing. 

The DCLs drive the network to an operation point where a 
user defined global utility function is maximized. The 
mathematical formulation allows both point-to-point multi- 
path and point-to-multipoint multi-path, the family of DCLs 
can be applied to a connectionless IP network to enable 
sophisticated service quality features, solely based on a set 
of shortest paths from any given ingress node to a set of 
egress nodes. 

A core node may be CoS and multipath agnostic and may 
employ any queue management / scheduling algorithms, 
e.g., simple FIFO queues, at its output ports. This family of 
DCLs allows fast time scale TE through multi-path load 
balancing. The proposed DCLs can automatically repartition 
the traffic in an optimal way among the rest of the multipath 
when path failures occur. The proposed scheme can be 
applied for both connection oriented and connection less 
networks. 
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Abstract- In Chaotic the properties of sensitivity to initial 
conditions, control parameters and pseudo-randomness chaotic 
maps have been widely used in data encryption recently. The 
chaotic based cryptosystems are suitable for large-scale data 
encryption such as images, videos, or audio data. This paper 
propose a novel higher dimensional chaotic system for audio 
encryption, in which variables are treated as encryption keys in 
order to achieve secure transmission of audio signals. Since the 
highly sensitive to the initial condition of a system and to the 
variation of a parameter, and chaotic trajectory is so 
unpredictable. As a result, we obtain much higher security. 
The higher dimensional of the algorithm is used to enhance the 
key space and security of the algorithm. The security analysis 
of the algorithm is given. The experiments show that the 
algorithm has the characteristic of sensitive to initial condition, 
high key space; digital audio signal distribution uniformity and 
the algorithm will not break in chosen/known-plaintext attacks. 
Keywords- Audio encryption; Chaos; Security; Higher 
dimensional chaotic maps; 

I. Introduction 

T he techniques of secure communication by which one 
can transmit confidential messages secretively are of 
practical interest in several areas, including databases, 
internet banking, production of communication channels etc. 
Based on the structure of the encryption algorithm we can 
classify cryptosystems into two categories, namely, stream 
cipher and block cipher. In a stream cipher algorithm, the 
message is encrypted bit-by-bit with the application of a 
secret key generator and the decryption can be done by 
using the same algorithm as in encryption, and with the 
same secret-key generator. In practice shift register, non- 
linear combination generator clock controlled generators ect. 
are used as key generators. Unlike the stream cipher, in 
block cipher a group of bits of fixed length is encrypted 
block by block. In another classification, which is based on 
method of distribution of secret key, one classifies the 
cryptosystems into classes-private (symmetric) key and 
public (asymmetric) key cryptosystems. In the private key 
cryptosystems, sender and receiver use the same key for the 
encryption and decryption, respectively. These systems are 
efficient and their security depends on the statistical 
properties of the random bit sequence and the length of the 
secret key. 

Most of the existing cryptosystems, except a few, utilize 
number theory, algebra, combinatory and computer 
arithmetic, est. as mathematical tools for constructing the 
algorithms for the encryption and decryption. 



Now a day it has been proved that in many aspects chaotic 
maps have analogous but different characteristics as 
compared with conventional encryption algorithms such as 
DES, IDEA and RSA. These are not suitable for practical 
audio encryption. At the start of last decade only, the use of 
the continuous as well as discrete chaotic dynamical systems 
has been made to develop cryptosystems. The chaotic 
systems are characterized by the sensitivity on initial 
conditions, system parameters, and unpredictability of 
evolution of its orbits. These thinks are used for encryption 
[ 1 - 6 ]. 

The chaotic systems for encrypting audio files are highly 
unpredictable and random-look nature this attractive feature 
may lead to novel chaotic applications. In most of the 
previous chaotic algorithm uses, XOR or XNOR is the basic 
technique for encryption. These techniques are very weak 
against statistical, chosen/known -plain-text attacks and 
moreover its security to brut-force attack is also 
questionable for avoiding the above risk and to increase the 
security level of the algorithm we propose a higher 
dimensional chaotic map [10]. Here we use multiple lookup 
table for encrypt the audio files all the values will be 
encrypted using cipher block chaining method. Arnold’s cat 
map is a good candidate for permutation, thus it is extended 
to eight dimensional versions, called 8D cat map and then 
used for encryption. Taking advantage of the exceptionally 
good properties of mixing and sensitivity to initial 
conditions and parameters of the chaotic 8D cat map, the 
proposed scheme incorporates newer architecture of chaos- 
based chipper block chain mode of encryption is proposed. 
It is used to interlink all the previous values of audio signal 
with the current value of the audio signal so without 
knowing the previous value of the audio signal value we 
cannot decrypt the current audio signal value this will 
increase the security one more level. Higher the dimension 
and higher the number of keys increases the complexity and 
keyspace of the algorithm. 

II. Audio encryption scheme based on higher 

DIMENSIONAL CHAOTIC MAP 

In this paper, a chaos-based audio encryption system, in the 
framework of cipher block chaining architecture is 
proposed. The block diagram of the system is shown in fig. . 
An analog audio input 
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A. Chaos-Based Look-Up Tables. 

Taking advantage of the exceptionally good properties of 
mixing and sensitive to initial conditions Chaos-based Look- 
Up tables are used for encrypting audio files. In chaos there 
are many maps it is found that Arnold’s Cat map [7] is a 
good candidate for permutation, thus it is extended to a 
higher dimensional version, called N th D cat map, and then 
used for this purpose. Here the relation between nth and 
n+lth data are de-correlated much higher than 2D and 3D 
maps so the authentication of our algorithm is increased 
much higher than the lower dimensional chaos maps. A 
higher dimensional Cat map is formed as follows: 



Table. 1. Look-Up Table generated using higher Dimensional cat map. 
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Is sampled at a frequency well above the Nyquist frequency 
of the signal. Then an 16-bit quantization is used to convert 
the analog signals into its equivalent decimal value. By 
masking these data with a random key stream generated by a 
chaos-based pseudo-random key stream generator, the 
corresponding encrypted audio is formed. The details of 
each component are to be discussed in the following 
sections. As demonstrated in our simulation, this approach is 
more secure in all the ways. 
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Where b are integers in [0 a 2 L — 1 ]_| 

This higher-dimensional Cat map is used as our pre- 
processing unit for generating Look-Up tables. The initial 
key stream arranged in tables {Ti ? T 2 , T 3 ...}. Here ‘m’ is the 
number of tables that is used for encrypting the audio signal 
L is number of bits used by the key stream in this algorithm 
L= 1 6 and m=8. 



B. Encryption Function 

On the encryption block, one uniform distributed random 
number is generated that sequence will select the tables for 
encryption. After selecting the table, the digital value of the 
audio signal is mapped to the iteration number of the chaotic 
sequence. For encrypting the nth digit, we add the n-l st 
cipher digit value with nth plain value that resultant value 
will be mapped with the table value. This type of encryption 
will increase the security in one more level. Because here all 
the pixels are interlinked if we want to decrypt the n th digit, 
first, we will know the n-l st plain digit then only we can 
decrypt the n th digit. This type of encryption is called cipher 
block chain mode of encryption the block diagram of 
encryption function is given in fig. 2. 




p. 

1 = Plain Digitized Audio value 
E k = Encryption Algorithm 
r. 

1 - Digital Cipher Audio value 
Kg = Key 

Fig. 2. The construction of the Cipher Block Chaining. 
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In this block diagram, the previous value of cipher digit is 
added with current plain digit of the audio signal. At the 
time of addition, the resultant value will be more than 
65,536 in sum time. If the value is more than 65,536, we 
require more than 16 bit for avoiding this some 
modifications down on the result for that the following 
formula is used 



B. Key Sensitivity 

High key sensitivity is required by secure cryptosystems, 
Which means that the cipher text cannot be decrypted 
correctly although there is only a slight difference between 
encryption or decryption keys? This guarantees the security 
of a cryptosystem against brute-force attacks to some extent. 



MP n = C n .i+Pn mod 65,536. (3) 

Where MP is the modified plain digit that will be encrypted 
with key Ke and the resultant cipher digit is C. After 
encryption, the cipher value of the audio is brought into 
original range. In this way, all the audio files will be 
encrypted. For decryption do the reverse process of the 
encryption. 



hi. Security description 

In the proposed cryptosystem, 8D chaotic cat map is used 
for encrypt audio files. The cryptosystems security is 
determined by the used chaotic maps Key space, Key 
Sensitivity and Plaintext Sensitivity. Here, chaotic map’s 
properties are in close relation with the cryptosystem’s 
security. At first, its parameter is used as confusion key. 
Thus, parameter sensitivity is in close relation with key 
sensitivity. The higher the parameter sensitivity is, the 
higher the key sensitivity is, and the stronger the 
cryptosystem is. Thus, the chaotic map with higher 
parameter sensitivity is preferred in this cryptosystem. 
Secondly higher the initial-value sensitivity is, the smaller 
the correlation between adjacent pixels is, and the more 
random the encrypted audio file is. Therefore, the chaotic 
map with higher initial-value sensitivity is preferred in this 
crypto system. In encryption iteration, time is in close 
relation with cryptosystem’s security. The more the iteration 
time is, the larger the cryptosystem’s key space is if 
different keys are used in different iteration. 

iv. Security analysis 

A. Key Space 

In the proposed cryptosystem, 8 keys are used for 
encryption. If n is the iteration time, and different keys are 
used in different iterations, then the key space is 

s = {k: 1 k: 2 -k 3 -.-k e T. (4) 

According to Eq.(l)-(3), the key space is determined by the 
parameter space of chaotic map. As can be seen, the 
cryptosystem’s key space S increases with rise of parameter 

space avavav-Ys or iteration time n. Parameter 

space can be increased with number of keys here it is 8. The 
iteration time can be chosen according to security and 
complexity requirements in this algorithm 16bit 
Quantization is used so n=65,536. Taking 1 x N sized 
Audio file with L level of Quantization with different Keys 

the complexity of the Cat map is N n • L n . 
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(a) Original Audio file (b) Encrypted Audio file: Key= 
1234353463465 




(c) Encrypted Audio file Keyl = 1234353463464 

(d) Difference Audio file 

Fig. 3. Key sensitive test: result one. 



0.5 H 




(a) Original Audio file 

(b) Encrypted Audio file: Keyl= 1234353463465 




(c) Decrypted Audio file Keyl = 1234353463465 x1 ° 

(d) Decrypted Audio file Keyl = 1234353463464 
Fig. 4. Key sensitive test: result 2. 

1) First, an audio file is encrypted by using the test 
keys keyl value is “1234353463465.” 

2) Then, the least significant bit of the keyl is change 
to “1234353463464” all the remaining seven keys 
are same which is used to encrypt the same audio 
file. 
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3) Finally the above two encrypted audio files with 
two slightly different keys, are compared. The 
result is: the image encrypted by the key 
“1234353463465” has 9999% of difference from 
the image encrypted by the key “1234353463464” 
in terms of Quantized digital values, although there 
is only one digit difference in one of the key, Fig. 3 
Shows the test result. 



C. Encryption On Uniform Audio File 




(a) Uniform Audio file (b) Encrypted Uniform Audio file 
Fig. 5. Uniform Audio Encryption 



An Audio file with uniform sound is taken that file is 
encrypted using this algorithm. After encryption, there are 
no patterns or uniformity on the encrypted audio file. This 
will ensure even in uniform audio our algorithm make more 
randomness so the security level of the algorithm is higher. 

D. Chosen/Known-Plaintext Attack 



Chosen/Known-plain text attacks are such attacks in which 
one can access/choose a set of plain texts and observe the 
corresponding cipher texts. In today’s networked world, 
such attacks occur more and more frequently. For a cipher 
with a higher level of security, the security against both 
known-plaintext and chosen-plaintext attacks are required. 
Most of the XOR-ing based techniques are not robust 
against this attack [10]. Apparently, even when the secret 
key is changed for each plaintext, these methods are 
insecure against chosen/known-plaintext attacks. The mask 

audio I m is obtained by simply XOR-ing the plain audio I 

with its corresponding cipher audio I’. XOR-ing the^ m 
mask with unknown cipher audio J’, if we get the unknown 
plain audio J then the algorithm fails in Chosen/Known - 
plaintext attack, otherwise the algorithm safe against 
Chosen/Known-plaintext attack. Fig. 7 demonstrates an 
unsuccessful chosen/known-plain text attack in the proposed 
algorithm. 





(c) XOR - Mask (d)UnknownCipherAudioFile2 (e)Failed to 
crack the Audio File2 

Fig. 7. Unsuccessful chosen/Known-plaintext attack on 
proposed algorithm 

V. Conclusions 

Telemedicine can provide access to health care in previously 
unserved or underserved areas. These areas include both 
rural and inner city or barrier locations, All of them use 
internet is the basic medium for transferring information. 
But internet is public medium anyone can get information’s 
from any ware, so the privacy is very less for improving the 
privacy over the public network we go for encryption. Most 
of the previous algorithms are return for encrypting only text 
messages. If we use the same method for encrypting media 
files like medical audio files, it will not be an efficient one. 
Because correlation of audio file is higher than the text file, 
so we go for higher dynamic system. 

The dynamic response of chaotic system is highly 
sensitive to initial values and the variation of a parameter 
and chaotic trajectory is very unpredictable. Therefore, this 
paper propose a higher dimensional chaotic system for 
encrypting medical audio files in telemedicine which is 
transferring medical files on unsecured internet and 
telephone network. After calculating correlation coefficient 
y and conducting FIPS PUB 140-1 to test on the higher 
dimensional chaotic system of this paper, it is found that the 
chaotic system proposed in this paper is obtain optical 
scatter characteristics this has ensure our algorithm is more 
secure. In this algorithm the number of keys are increased so 
the key space and the complexity of the algorithm although 
increases accordingly. Even in public channel, when the 
encrypted audio files are stolen, an intruder cannot decrypt 
and recover the original audio file. From the above security 
analysis and various attacks, this algorithm gives better 
results. Even the audio file is uniform; the algorithm will 
give better results. Therefore, this algorithm is useful for 
mission critical applications of medical transaction over the 
unsecured public networks. 
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Abstract- Efficient communications are crucial for disaster 
response and recovery. However, most current public safety 
land mobile radio (LMR) networks only provide narrowband 
voice service with limited support of low-speed data services 
due to the packet delay and loss and less resource utilization. 
So high resource utilization techniques are needed. The Session 
Initiation Protocol (SIP) and a joint radio resource 
management framework is a current technology to support the 
interoperability between cellular and LMR networks.SIP is 
seamless handoff scheme. In this paper, we study to enhance 
the interoperability of LMR with commercial wireless cellular 
networks, by which a wide variety of benefits can be offered to 
disaster responders, including new multimedia services, 
increased data rates, and low cost devices. Our approach is 
based on FDS and IDS to reduce the packet delay and loss in 
the interoperable between cellular and LMR networks. In 
addition, an optimal bandwidth utilization scheme is proposed 
to maximize the overall radio resource utilization. In our first 
experiment, we are applying the proposed approach to the 
interoperable cellular and LMR networks. The two dynamic 
scheduler FDS and IDS is reduced the packet delay and loss. 
The proposed approach is a novel one in interoperable cellular 
and LMR networks besides being the first approach to tackle 
the issue of packet delay and loss in the heterogeneous mobile 
wireless network. The proposed scheduler is good to guarantee 
service availability and continuity quality of service (QoS) for 
disaster responders .Our approach also used for high resource 
utilization in the same network. 

Keywords - Interoperability, public safety land mobile 
radio, Packet delay, and loss. 

I. Introduction 

D isaster response and recovery require timely interaction 
and coordination of disaster responders in order to save 
lives and property. Efficient communications are crucial 
during disasters. With recent advances of wireless 
technologies, mobile wireless networks play an increasingly 
important role in disaster response. Currently, public safety 
land mobile radio (LMR) is used by public safety agencies 
for coordinating teams and providing rapid emergency 
response. 

During disasters, efficient communications are crucial for 
disaster responders in disaster response and recovery. For 
example, it is desirable for the disaster responders to have 
the access to the Internet to share real-time multimedia 
information with off-site commanders and specialists 



providing expert assistance. However, these communication 
services are not available in the current public safety LMR. 
Whereas in commercial cellular networks, less service 
availability means less revenue; in public safety arena, less 
service availability may affect lives. Therefore, it is 
attractive to enhance the interoperability of these two 
wireless networks, by which a wide variety of benefits can 
be offered to disaster responders, including new multimedia 
services (e.g., video), increased user data rates and low cost 
devices. 

A. Interoperable Cellular and Public Safety LMR 

Natural disasters or terrorist attacks often occur in a 
localized region, we assume that the coverage of the LMR is 
under the coverage of the cellular network. The mobile 
devices used by disaster responders are equipped with 
multiple radio interfaces that enable them access both the 
LMR and the cellular network within the coverage of the 
LMR. However, for commercial users, only the cellular 
network can be accessed. IP-based multimedia services 
(e.g., video streaming) are available to disaster responders 
via the cellular network, and mission-critical services (e.g., 
tactical group voice) are provided to them via the LMR. 
Since disaster responders are free to move in the 
interoperable LMR/cellular systems, the support of handoff 
between these two networks, which provides ongoing 
service continuity, is needed in this integration. In this 
interoperable system, disaster responders are efficiently 
communicated with state-of-the-art applications In the 
interoperable cellular and public safety LMR networks, 
disaster responders can access the services in cellular 
networks that are not available in public safety LMR 
networks to increase the service availability. Furthermore, 
when a disaster responder moves out of the coverage of 
public safety LMR networks with an ongoing 

communication session, the session should be handoffed to 
cellular networks instead of being dropped to provide the 
communication continuity. 

There are some schemes proposed for the interoperability of 
heterogeneous wireless networks. Authors of [6] propose a 
location management scheme, including location update and 
paging, in heterogeneous systems. The optimal conditions 
under which vertical handoff should be performed is studied 
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in [7]. Authors of [8] and [9] study several admission 
control schemes in cellular/WLAN integrated networks to 
improve the performance of voice and data services. 
Scalable routing techniques are proposed for heterogeneous 
mobile networks. The commercial wireless cellular user 
community is two orders of magnitude larger than the 
public safety LMR base. Consequently, the R&D 
investments in commercial wireless cellular networks 
dwarf those made in public safety LMR networks. 




Fig. 1: Interoperable wireless cellular and public safety 
LMR networks 

B. State of the Art 

Many methods have been proposed to enhance the 
interoperability between cellular and LMR networks based 
on Session Initial Protocol (SIP) [11] and a joint radio 
resource management framework, which are different from 
the schemes in previous work [5] -[10]. SIP is designed by 
the Internet Engineering Task Force (IETF) to provide 
application-layer signaling for voice and multimedia session 
management, which can achieve true end-to-end mobility 
management. In addition, SIP has excellent extensibility and 
scalability due to its operation at the highest layer and use of 
text-based control messages. Several wireless technical fora 
(e.g., 3GPP, 3GPP2 and MWIF) have agreed to use SIP to 
provide session management. However, traditional SIP- 
based handoff scheme may have considerable handoff 
delays due to the exchange of application layer messages, 
which may be unacceptable for real-time multimedia 
services [12]. 

All of these are discussed with hand off delay, location 
management scheme, several admission control schemes 
and scalable routing techniques, while the need is reducing 
the packet delay and loss. There is critical need for an 
approach that is able to optimize the packet delay and loss to 
improve the multimedia services in interoperable 
heterogeneous mobile wireless network for achieving QoS. 

ii. Research approach 

This research work detects the problem of providing disaster 
response and recovery in interoperable Heterogeneous 
Mobile Wireless Networks. This goal is achieved by 
enhancing the radio resource management, reducing the 
packet delay and loss. We rigorously formulate this 
enhancing the radio resource management, reducing the 
hand off delay and the packet loss problem in the context of 
heterogeneous mobile wireless networks and present 



feedback based dynamic scheduler based on discrete time 
control theoretic approach that provide guarantees on the 
quality of service and the service availability. 

A. Feedback Dynamic Scheduler 

FDS and IDS algorithms will be designed using feedback 
control theory. We will assume that both algorithms, 
running at the HC, allocate the WLAN channel bandwidth 
to wireless stations hosting real-time applications, using 
HCCA functionalities. This allows the HC to assign TXOPs 
to ACs by taking into account their specific time constraints 
and transmission queue levels [13]. We will refer to a 
WLAN system made of an access point and a set of quality 
of service enabled mobile stations (QSTAs). Each QSTA 
has up to four queues. Let T C a be the time interval between 
the starting of two successive CAPs. Every time intervals 
T C a, which is assumed to be constant, the HC must allocate 
the bandwidth that will drain each queue during the next 
CAP. We assume that at the beginning of each CAP, the HC 
is aware of all the queue levels. qii=l..M, at the beginning of 
the previous CAP, where M is the total number of traffic 
queues in the WLAN. 

The following discrete time linear model describes the 
dynamics of the queue: 

*(«+!) = + 4(n)?CA + 

t 1 . . ■ ■ 1 Af ^ 1 ^ 

where qi>0 is the queue level at the beginning of the nth 
CAP; ui<0 is the average depletion rate (i.e., its absolute 
value represents the bandwidth assigned to drain the 
queue) ;di(n)=dis-dicp(n) is the difference between dis(n)>0, 
which is the average input rate at the queue during the nth 
TCA interval, and diCP(n)>0, which is the average output 
rate at the queue during the nth TCAinterval. The input di(n) 
is unpredictable since it depends on the behavior of the 
source that feeds the ith queue and on the number of packets 
transmitted Thus, from a control theoretic perspective, di(n) 
can be modeled as a disturbance [25]. Without loss of 
generality, the following piece-wise constant model for the 
disturbance di(n) can be assumed: 

di(n) = ^ilu kt ■ 1(ti - fj) 

j-" 0) 

Where l(n)is the unitary step function. Due to the 

assumption (2), the linearity of the system described by (1), 

and the superposition principle that holds for linear systems, 

we will design the feedback control law by considering only 

a step disturbance: di(n)=d0 .l(n) [25]. 

B. Closed-Loop Control Scheme 

Our goal is to design a control law that drives the queuing 
delay Ti, experienced by each frame going through the nth 
queue, to a desired target value TiT that represents the QoS 
requirement of the AC associated to the queue. We will 
consider the closed loop control system shown in Fig. 2, 
where the set point qiT has been set equal to zero, which 
means that we would ideally target empty queues 
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Fig 2: Closed loop control scheme 

Regarding the transfer function Gfz) of the controller, we 
will focus on two very simple controllers: a proportional (P) 
controller Gi(z)=k pi , and a proportional-integral (PI) 
controller G i (z)=k p i(l-\-z/z-l,l/T Ii ) . 

C. Computational Complexity of the Bandwidth 
Allocation Algorithms 



Herein, we estimate the computational complexity of the 
proposed allocation algorithms. 

Proposition 1: In a WLAN system with M active traffic 
streams, the computational complexity of the FDS algorithm 
is 0(2M). 

Proof: Every time interval TCA, the HC computes the 
bandwidth assignment for each one of the M active traffic 
streams. With FDS, from Fig. 2, the control law is 



(tl H“ l) = —kj>i * 

Thus, (3) becomes 



( 3 ) 



TXOPXn) — -+- UtTi) w 

Where A = As a consequence a single 

bandwidth assignment consists of two multiplications and 
one sum. The first multiplication takes into account the term 
the second one estimates the protocol overhead, 
which is proportional to Thus, we need 2M 

multiplication plus M sums for each T C a interval, i.e., the 
0(2M). 

Proposition 2: In a WLAN system with M active traffic 
streams, the computational complexity of the IDS algorithm 
is 0(4M). 

Proof: For each active stream, the HC computes the 
bandwidth . When IDS is used, the control law is 



mj { n + I) = -k iri ■ $(n) (*) 

1 1 (5) 

which can be also written as 



schemes can be easily implemented in real wireless network 
interface cards. 

hi. Evaluation and preliminary results 

Table I reports the average and peak superframe utilization 
in HCCA mode, which is defined as the sum of TXOPs 
allocated during CAPs over the superframe duration. It 
shows that the Simple scheduler requires the highest average 
quota of WLAN resources. The reason is that the simple 
scheduler does not adapt the quota of allocated resources to 
the actual load because it provides a CBR service. 




Fig 3: Packet delay 

For the same reason, the peak superframe utilizations 
achieved by the Simple scheduler for a = 10 and a = 12, i.e., 
at high traffic load, are smaller than those provided by IDS 
and FDS. This result clearly highlights that the proposed 
control schemes enable more proper usage of the bandwidth 
and allows the bandwidth requirements of the real-time 
flows to be tracked. 
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Table I Results 



IV. Conclusion 



«i(» + 1) = Hi(™) + *pifly(n - 1) - 1^(1 + 1/Tjj )?;(«)- ^ 
Thus, (6) becomes 

TXOPi(n) = + tjiflfn - 1) - W.(nj| + H(n) 

U CO 

Where x = Mi + 



Now, considering that the overhead is estimated using 1 
multiplication, a single bandwidth assignment consists of 4 
multiplications and 3 sums. Consequently, we need 4M 
multiplication plus a 3M sums for each interval. Thus, the 
computational complexity is 0(4M). 

From the above propositions, we can conclude that the 
computational complexities of both FDS and IDS scale 
linearly with the number of active streams. Thus, such 



We have studied the interoperability problem in public 
safety LMR networks and commercial cellular networks for 
disaster response. The interoperability can be enhanced by 
using S-SIP and Optimal radio resource management. We 
have presented Feedback dynamic scheduler mechanism to 
maximize the overall radio resource utilization while 
guaranteeing the QoS such as reduced packet loss. 

Further study is in progress to consider other QoS 
requirement in the interoperable cellular /LMR wireless 
networks. 
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Abstract- In this paper, the possible effects of the problem of 
congestion in Wireless infrastructure LAN are discussed. It 
presents the discrete-event simulation, provides detailed, 
accurate network simulation results and it observed a wide 
variety of network statistics for Congestion Control in Wireless 
LAN. The software simulation package, OPNET (Optimized 
Network Engineering Tool), can be best described as a set of 
decision support tools, providing a comprehensive development 
environment for the specification, simulation and performance 
analysis of communication networks, computer systems and 
applications and distributed systems. Discrete event 
simulations are used as the means of analyzing the system 
performance and behavior. OPNET simulations were carried 
out to estimate the effect of congestion situation on the global 
performance of the network model. A tradeoff of various 
congestion parameters such as dropped data, load, throughput, 
retransmission attempts and received data traffic has been 
observed by creating different scenarios. Simulations are 
carried out at 11Mbps data rate is 900 simulations-seconds. 
There are six sections in this paper, section I deal with the 
introduction to the wireless LAN and the causes behind the 
occurrence of Congestion in general, section II deal with the 
problem of Congestion in Wireless environment, section III 
deal with the suggestions regarding the possible solution to the 
problem of congestion, section IV and V are to deal with the 
simulation and results and at last the paper is concluded by 
summarizing the important results. 

Keywords- Wireless LAN, IEEE 802.11, OPNET 

I. Introduction 

I n Wireless LAN, Congestion is much more critical 
problem as compared to the Wired LAN because the error 
rate is much higher in Wireless LANs and it does not permit 
to allow a single collision to occur in the network, which 
will lead to the drastic reduction in throughput. Also unlike 
the wired networks, congestion measurement and analysis 
in IEEE 802.11 wireless networks is more difficult due to 
factors such as time-variant channel capacity, contention 
among neighboring nodes, interference variable quality of 
radio signals, transmitted power etc. Also detecting collision 
in wireless medium is not always possible. Congestion 
occurs when the amount of data sent to the network exceeds 
the available capacity, the routers are no longer able to cope 
up the demand and they begin losing packets. At very high 
traffic rate, the performance collapses completely, and 
almost no packets are delivered. Congestion can be brought 
about by several factors viz shortage of buffer space, slow 
links and slow processors [1-2] 

A. Shortage of Buffer Space 

If large capacity buffers are used in order to compensate for 



Shortage of buffer space, many short-term congestion 
problems will be solved but this will cause undesirably long 
delays 

B. Slow Links 

Though the problem of congestion caused due to slow links 
will be solved if high-speed links become available but this 
is not always the case, sometimes increases in link 
bandwidth can aggravate the congestion problem because 
higher speed links may make the network more unbalanced. 
Higher speed load can make the congestion condition in the 
switch worse [3-5]. 

C. Slow Processors 

On improving the processor speed, faster processors will 
transmit more data in unit time. If several nodes begin to 
transmit to one destination simultaneously at their peak rate, 
the target will be overwhelmed soon. 

ii. Congestion in wireless environment 

Traditional problems of wireless communications and 
wireless networking are 

1) The channel is unprotected from outside signals 

2) The wireless medium is significantly less reliable 
than wired media 

3) The channel has time- varying and asymmetric 
propagation properties 

4) hidden-terminal and exposed-terminal phenomena 
may occur 

In the event of packet loss, appropriate action is not easily 
taken, as identifying the cause of the loss is difficult. 

There have been various mechanisms proposed to help 
classify the reason for packet loss, but all add extra 
complexity, may not be compatible with existing protocols 
and none seem to cover all possible causes [5-7]. 

hi. Possible solutions 

There are two general solutions to the problem of congestion 

1) Congestion avoidance 

2) Congestion control 

Congestion avoidance attempts to predict when congestion 
is about to occur and reduces the transmission rate at this 
time. The algorithm should operate in such a manner to keep 
response time v/s load and throughput v/s load operating to 
the left of the location of the knee in Fig 1.1. 

Congestion control attempts to take fuller advantage of the 
network resources by transferring data at a rate close to the 
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capacity of the network. The capacity of the network can be 
viewed as the point at which any increase in traffic will 
increase the delay but not the throughput. Congestion 
control algorithms, like that of TCP, attempt to increase 
traffic until the capacity of the network is reached, and then 
slow the transmission rate. Thus these algorithms attempt to 
operate to the left of the cliff in Fig. 1.1. 




IV. Simulation 



The occurrence of a high density of nodes within a single 
collision domain of an IEEE 802.11 wireless network can 
result in congestion, thereby causing a significant 
performance bottleneck. Effects of congestion include 
drastic drops in network throughput, unacceptable packet 
delays, packet drops, retransmissions, and session 
disruptions. OPNET simulation was carried out to estimate 
the effect of congestion situation on the performance of the 
network model. Simulation was run at 11 Mbps data rate 
except congested node and total simulation time used was 
900 simulations-seconds. 

To observe the congestion in IEEE 802.11 networks, 
WLAN node model was created on OPNET IT guru 
academic edition 9.1 (Figure 1.2) 



Table 1.1 a Parameter Setting of WLAN 
Network 



WLAN 

environment 


Campus 


Workspace area 


100m x 100m 


Node model 


wiaui station adv 



Table 1.1 b Parameter Setting of WLAN 


Network 




Number ofnodes 


6 

(PCF_WKSTN) 




Access Point 


1 (AP_0) 





compared with periphery nodes in order to study congestion. 

The packet size distribution is exponential with a mean of 
92 bytes. The inter arrival time is exp (0.02) for all the 
nodes unless otherwise specified. Since the packet size is 
exponentially distributed with mean of 92 bytes, RTS/CTS 
exchange is required for most of the packets. All the 



wireless station nodes and the access point use Frequency 
Hopping Spread Spectrum at the physical layer. All the 
nodes employ the PCF basic CSMA/CA access mechanism. 
The nodes transmit at a maximum data rate of 2 Mbps. 
Packets received at a node with power less than 7.33 E-14 
Watts will find receiver to be busy. The packet 
transmissions with a power higher than this threshold are 
considered as valid. Unless the default transmission power is 
changed, all the WLAN packets should reach their 
destinations with sufficient power to be valid packets if the 
propagation distance between the source and destination is 
less than 300 meters as required by the IEEE 802. 1 1 WLAN 
standard. 




The distance between any two-periphery nodes is about 50 
meters. In the simulation model considered here, all the 
nodes are static. The simulations were carried out for 900 
simulation seconds and repeated many times in order to 
ascertain validity. 

V. Results 

The buffer size, bandwidth, and data rates of AP_0 have 
been reduced as compared to other nodes in order to study 
its impact on the performance of the network. Various 
global parameters and individual node parameters were 
observed. The global parameters were chosen as data 
dropped, load, throughput, and their variations against 
simulation time are shown in Figs 1.3, 1.4 and 1.5. 

Individual node parameters were chosen as retransmissions 
attempts and data traffic received and are plotted in Figures 
1.5 and 1.6. 




Fig- 1.3: Data Dropped with and without Congestion 

Observing the global data dropped (Fig. 1.3), it was 
observed that data dropped in the network is very high as 
compared to the situation when all nodes were having 
exactly similar buffer size, bandwidth, and data rates. 
Initially, it was estimated 22269 times the without 
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congestion situation. Thereafter, it increases until 206562 
for 900 simulation seconds. 

Another global parameter chosen is load (Fig- 1.4) of the 
network. It was observed that as compared to the situation 
when all nodes were having exactly similar attributes the 
load reduces to 54.7% initially which further dips to 14.2% 
for 900 simulation seconds duration 




Fig- 1.4: Global Load with and without Congestion 
Another global parameter chosen is Throughput of the 
network (Fig- 1.5). The throughput is constant throughout 
the simulation time of 900 simulation seconds. In that case it 
has been observed that in Congestion situation, the 
throughput is constant at 0, making a reduction of 
approximately 100% as compare to the situation of no 
congestion. That means in Congestion the throughput is 
totally zero 




Fig- 1.5: Global Throughput with and without Congestion 
The impact of congestion situation on various individual 
nodes was also studied. The node parameters chosen for this 
study were Retransmission attempts and data traffic 
received. Retransmission is another important node 
parameter affected by congestion situation. The comparison 
of retransmissions of various nodes against simulation time 
is shown in Fig. 1.6. The retransmission in PCF_WKSTN7 
varies between (329-313). Likewise, the retransmissions 
values corresponding to other nodes-namely 
PCF_WKSTN10,PCF_WKSTN5, PCF_WKSTN6, 

PCF_WKSTN8 and PCF_WKSTN9 are varying between 
(294-305), (280-310), (287-308) (280-312) and (280-312), 
respectively. While that of AP_0 is maximum that varies 
between (524-617). That means the Congested node will 
have to do maximum number of retransmissions in order to 
receive any data packet. 




Fig-1.6:Node Comparison of Retransmission Attempts 




Fig- 1.7: Node Comparison of Data Traffic Received 

Variation of traffic received by various nodes against 
simulation time is shown in Fig. 1.7. Traffic received in 
congested node (AP_0) is minimum as compared to other 
nodes. Likewise, the traffic-received values corresponding 
to other nodes- namely, PCF _WKSTN5,PCF _WKSTN6 
,PCF_WKSTN7, PCF_WKSTN8 PCF_WKSTN9 and 
PCF_WKSTN 1 0 are varying between (161-166), (163-167), 
(163-166), (162-166), (163-166) and (163-168), 

respectively. While that of AP_0 the value is just 0. That 
means the Congested node does not receive any data traffic. 

VI. Conclusion 

In this paper, the performance of wireless infrastructure 
networks in terms of congestion has been studied through 
OPNET simulator and the results are presented for global as 
well as for the individual parameters. It was observed that 
global parameter viz. data dropped increases with 
congestion situation. However, the load and throughput 
reduces in similar situation. Individual node parameters such 
as the number of retransmission attempts is maximum in 
congested node as compared to the other nodes and received 
data traffic in congested node were found to be zero 
throughout the simulation duration of 900 simulation 
seconds. 
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Abstract-1 n general words, image spam is a type of e- 
mail in which the text message is presented as a picture in an 
image file. This prevents the text based spam filters from 
detecting and blocking such spam messages. In our study we 
have considered the valid message as “ham” and the invalid 
message as “spam”. Though there are several techniques 
available for detecting the image spam (DNSBL, Greylisting, 
Spamtraps, etc.,) each one has its own advantages and 
disadvantages. On behalf of their weakness, they become 
controversial to one another. This paper includes a general 
study on image spam detection using some of the well -liked 
methods. The methods comprise, image spam filtering based on 
File type, RGB Histogram, and HSV histogram, which are 
explained in the following sections. The finest method for 
detecting the image spam from the above-mentioned methods 
can be determined from the above study. 

Keywords- File Type, HSV Histogram, Image Spam, RGB 
Histogram 

I. Introduction 

S pam can be uttered as Unsolicited Bulk E-mail (UNBE). 

The most extensively predictable category of spam is e- 
mail spam. Most UNBE is designed for elicitation, phishing, 
or advertisement. E-mail spam is the practice of sending 
unwanted e-mail message through junk mail, frequently 
with commercial content, in large quantities to an 
indiscriminate set of recipients. A Spam message also holds 
its hand with Instant Messaging System. This Instant 
Messaging spam, which is also known as “Spim”, makes use 
of instant messaging system. Mobile Phone Spam is directed 
at the text messaging service of a mobile phone. In the 
similar fashion spam targets on video sharing sites, real time 
search engines, online game messaging and so on. 

In the mid 1990s when the internet was opened up for the 
general public Spam in e-mail started to become a problem. 
In the following years the growth of spam was exponential 
and today it comprises some 80-85 percent of all the e-mail 
in the world, by conservative estimate [3]. Spam messages 
have its wings stretch into all kinds of applications in recent 
years. More techniques are adopted by several service 
providers to eliminate spam messages and not all are 
noteworthy. All these techniques are losing their potency as 
spammers become more agile. The spammers have turned 
their towards image spam in the recent years. 

The embedded image carries the target message and most 
email clients display the message in their entirety. Most of 
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them emails also have similar properties as image-based 
emails; existing spam filters are no longer capable of 
detecting between image-based spam and image ham [1]. 
This provides a way for the spammers to easily foil the spam 
filters. The text messages embedded in all image spam will 
convey the intent of the spammer and this text is usually an 
advertisement and often contains text, which has been 
blacklisted by spam filters (drug store, stock tip, etc). 

ii. Neural networks 

This image spam can be identified using various methods. 
By using supervised and semi, supervised learning 
algorithms in neural networks image spam can be detected. 
A neural network is a computational model or mathematical 
model that tries to simulate the structure and/or functional 
aspects of biological neural networks. An artificial neural 
network is an adaptive system that change its structure based 
on external or internal information that flows through the 
network during the learning phase. Thus, neural networks 
are non-linear statistical data modeling tool. Different types 
of artificial neural networks that can be trained to perform 
image processing are feed forward neural networks, Self- 
Organizing Feature Maps, Learning Vector Quantizer 
network. All these networks contain at least one hidden 
layer, with fewer units than the input and the output layers. 
In particular, the Back propagation neural network 
algorithm performs gradient-descent in the parameter space 
minimizing an appropriate error function. The Parameters 
like mode of learning, information content, activation 
function, target values, input normalization, initialization, 
learning rate, and momentum decide the performance of the 
back propagation neural networks. The back propagation 
neural network can be used for compression of various types 
of images like natural scenes, satellite images, and standard 
test images. In this paper, back propagation neural network 
is implemented to detect the image spam. Back propagation 
algorithm is a widely implemented learning algorithm in 
ANN. This algorithm implemented is based on error 
correction learning rule. The error propagation contains two 
passes through the different layers of the network, a forward 
pass and a backward pass. In the first case, the synaptic 
weights of the networks are fixed, whereas in the latter case, 
the synaptic weights of the network are adjusted in 
accordance with an error-correction tool. A neural network 
has wide range of applications. Their application areas 
include data processing including filtering, clustering, blind 
source separation and compression, classification including 
pattern and sequence recognition, novelty detection and 
sequential decision making and function approximation or 
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regression analysis, including time series prediction, fitness 
approximation and modeling. 

hi. Related work 

Many discussions have been carried out previously on 
image spam detection. Under this section, we have an 
overview of the literature. 

Marco Barreno et al., in [2], explains the different types of 
attacks on the machine learning algorithms and the systems, 
a variety of defenses against those attacks, and the ideas that 
are important to secure the machine learning. This approach 
illustrates the methods that spammers handle to attack a 
system to design an image spam. The issue of machine 
learning security goes beyond intrusion detection systems 
and spam e-mail filters. The different measures of defenses 
involved in their discussion are robustness, detecting the 
attacks, disinformation, randomization for targeted attack, 
and cost of countermeasures. 

A modification of Latent Dirichlet Allocation (LDA), 
Known as multi-corpus LDA technique was introduced by 
Istvan Biro et al., in [4]. In their proposal, they created a 
bag-of- words document for every web site and run LDA 
both on the corpus of sites labeled as spam and as non-spam. 
This assisted them to collect spam and non-spam topics 
during their training phase. They implemented these 
collections on an unseen test site to detect the spam 
messages. This method in combination with web spam 
challenge 2008 public features, and the connectivity sonar 
features is used to test images. Using logistic regression to 
aggregate these classifiers, the multi-corpus LDA yields an 
improvement of around 11 percent in F-measures and 1.5 
percent in ROC. 

Spam web page detection through content analysis is put 
forth by Alexandros Ntoulas et al., in [11], which projected 
some earlier undefined techniques for automatic spam 
message detection. They also discussed the effectiveness of 
those techniques in isolation and when aggregated using 
some classification algorithms, which proved to be truth 
worthy in detecting the image spam.. 

Bhaskar Mehta et al., in their paper [2], describe two 
solutions for detecting image-based spam after considering 
the characteristics of image spam. The first utilized the 
visual features for classification, and offers an accuracy of 
about 98 percent, i.e. an improvement of at least 6 percent 
on comparison with the existing solutions. Second, they 
used SVMs to train classifiers using judiciously decided 
color, texture, and shape features. This approach helped 
them in dtecting near duplication in images. The strategies 
for Image spam detection discussed in their work are near- 
duplicate detection in images, visual features for 
classification, an algorithm for classification of visual 
features, optical character recognition (OCR). 

Clustering based spam detection is put forth by Chun Wei et 
al., in [23], which propagates a fuzzy-matching algorithm to 
group subjects found spam emails, which are generated by 
malware. The subjects similar to each other are found out 
using a dynamic programming. The main proposal is that 
the recursive seed selection strategy allows the algorithm to 



detect similar patterns even the spammer creates a variation 
of the original pattern. This proved to be an effective 
approach in detecting and grouping spam emails using 
templates. Clustering algorithm is utilized to find the 
similarity of strings, similarity of spam subjects and for 
clustering spam subjects. 

Seongwook Youn and Dennis McLeod in [24], describes the 
method of filtering gray e-mail using personalized 
ontologies. Their work in [24], explains a personalized 
ontology spam filter to make decisions for gray e-mail. Gray 
e-mail is a message that could reasonably be estimated as 
either spam or ham. A user profile has been created for each 
user or a class of users to handle gray e-mail. This profile 
ontology creates a blacklist of contacts and topic words. 

A. Image Spam Classification Based Of Text Properties 

Image Spam classification based on text properties includes 
finding the location of texts in images or videos. Texts are 
usually designed to attract attention and to reveal 
information. The Connected Component based (CC) and the 
texture-based approaches are the two leading approaches 
used in the past to extract the characteristics for the text 
detection task. These characteristics include coherence in 
space, geometry and color [5]. In CC-based methods [6], the 
image is segmented into a set of CCs and is grouped into 
potential text regions based on their geometric relation. 
These potential regions are then examined using some rule- 
based heuristics, which makes use of the characteristics like 
size, the aspect ratio and the orientation of the region. The 
efficiency of this method becomes questionable when the 
text is multi-colored, textured, with a small font size, or 
overlapping with other graphical objects. 

In texture-based methods [7], it is assumed that the texts 
have distinct textural properties, and this can be used to 
distinguish then from the background. Even though this 
method perform well for images with noisy, degraded, or 
complex texts and/or background it seems to be time- 
consuming as texture analysis is essentially computational 
intensive. 

An increase in the use of internet in the recent years, had led 
to tremendous growth in volumes of text documents 
available on internet. Accordingly, the management and 
organization of text has become an important task. So a set 
of predefined categories of these text documents known as 
Text Categorization is maintained. A number of machine 
learning algorithms such as K-nearest Neighbor, Centroid 
classifier, Naive Bayes (NB), Winnow and Support Vector 
Machines (SVM) are extensively used to deal with Text 
Classification. OCR technique is used to isolate text from 
image. Carrying out semantic analysis of text embedded into 
images attached to e-mails first requires text extraction by 
OCR techniques. 

Naive Bayes is a simple classifier most commonly used in 
pattern recognition, while it has the assumption that the 
feature attributes are independent, the accuracy of the Naive 
Bayes classification is typically high [8]. Support Vector 
Machine (SVM) has been widely applied to most practical 
applications because of its superiority in handling high 
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dimensional data. The parameter tuning [9, 10], and the 
Thresholding [11, 12], are the two common techniques that 
are applied prior and posterior to the SVM algorithms 
respectively. 

B. Image Spam Classification Based On Content 

One popular practice when creating spam pages is 
“Keyword Stuffing". [13] In Content based image spam 
detection we investigate whether an excessive number of 
words within a web page (excluding markup) is a good 
indicator spam. In the next step, have to determine whether 
there is excessive appearance of keywords in the title of a 
page. Uncommon practice that was observed in manually 
tagged data set is the use of “composite” words in spam 
pages. 

Content-based Naive Bayes (PGRAM) is another technique 
for the classification of Image spam. In [14], the task of 
spam detection has floated the idea of a partial Naive Bayes 
approach, biased towards low false positive rates. It also 
uses word tokens, but filters out predefined common tokens. 
The content and the header of the incoming e-mail is mostly 
analyzed by the available anti-spam techniques [15]. They 
try to infer something about the kind of the material 
contained in the message by looking for specific pattern 
typical of a spam message. For these reasons, these filters 
are known as “content based.” There are many anti-spam 
techniques available that falls under this category. 

Blacklist and White list filters check whether the incoming 
message is from a known and trusted email address. Rule 
based filters correlate a score to every incoming email 
calculated according to a set of rules based on typical 
features of spam messages (fake SMTP components, 
Keywords, HTML formatting, etc) [16]. In case the score 
exceeds the given threshold value it is recognized to be a 
spam message. Major problem in this method is that, since 
its semantics are not well defined, it is difficult to aggregate 
rules and ascertains a threshold that limits the number of 
false positives. Spam Assassin [17], results from the 
successful implementation of the above-mentioned 
technique. 

iv. Proposed methodology 

In this work, we proposed two new image spam classifiers 
based on file properties and histogram of an image. The 
proposed techniques can be explained in the following 
sections., 

A. Image Spam Detection Based On Their File Type 



One method of detecting the image spam is based on their 
file type. Image spam e-mails will mostly contain images in 
JPEG or GIF file types. The basic features (see tablet.) that 
can be derived from an image at an extremely low 
computational cost are the width and the height denoted in 
the header of the image file, the image file type and the file 
size. In this study, we focus on the all file formats that are 



commonly seen in e-mail, which are the Graphics 
Interchange Format (GIF), and the Joint Photographic 
Experts Group (JPEG) format, Bitmap (BMP) and Portable 
Network Graphic (PNG). By parsing the image headers of 
the image files using a minimal parse a general idea of the 
image dimensions (width and the height), can be obtained; 
as this does not involve any decompression or decoding on 
any actual image data the dimensions can be obtained rather 
faster. 

In the case of GIF files there will be presence of virtual 
frames [18], which may be either larger or smaller than the 
actual image width. And this issue can be detected by 
decoding the image data. The problem imposed in the case 
of the corrupted images is that the lines near to the bottom 
of the image will not decode properly. Any further decoding 
of the image data from that point of corruption will be 
decisive. In order to obtain the amount of information that 
we gain from above features, the signal to noise ratio is 
defined. It is calculated as the distance of the arithmetic 
means of the spam and ham classes divided by the sum of 
corresponding standard deviation. 

jLhpcan- jLlham 
S2N = | (Jspam-Oham 

Where, /Jspamis the Mean value of the spam, 

/Jhamis the Mean value of ham, 

Gspam is the standard deviation of spam, 

Ohamis the standard deviation of ham. 



Table 1. Image Features 



Features 


Description 


n 


Image width denoted in header 


f2 


Image height denoted in header 


f3 


Aspect Ratio: fl/£2 


f4 


File Size 


f5 


File Area: fl.f2 


f6 


Compression: f5/f4 



This feature analysis reveals a fact that the binary features 
reflect the percentage of images in the respective formats. 
The feature f6 is the most informative feature beyond the 
binary image format feature. Most legitimate images in e- 
mails (“ham”) are JPEG images. The f3 is the aspect ratio of 
the image (i.e.) fl/f2. The feature f6 captures the amount of 
compression achieved by calculating the ratio of pixels in an 
image to actual image size. The compression is better if 
more number of pixels is stored per byte. This stimulates us 
to classify image spam with a similar supervised learning 
idea like Data Modeling. 

B. Image Spam Detection Rgh Histogram 

Image spammers implement different randomization 
techniques to introduce noise into spam images. This makes 
the single feature not able to detect all the variations 
introduced into an image. Hence, color histogram filter can 
be used to detect image spam in this case [19]. Spam image 
formation algorithms are intended to defeat well-known 
vision algorithms like OCR (Optical Character 
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Recognition). The color histogram is a trouble-free feature 
and can be calculated very effectively by one simple pass of 
the whole image. 

A 64-dimensional color histogram can be used in a RGB 
color space. The values in each of the color space (R, G, and 
B) can be divided into 4 bins of equal size resulting in 4X4X 
4 = 64 bins in total. For each bin, the amount of color pixel 
that falls into that particular bin is calculated. Finally, it is 
normalized so that the sum equals to one. The distance 
between the two color histogram features can be determined 
using LI distance. For D-dimensional, real-valued feature 
vectors of an image, the LI distance of the pair of points 

X =(Xi Xd) and Y = has to form 

f 19]: d(A\Y) = ^\Xf-Yi I 

f-1 



Frankel et al in [20], quantifies color saturation as the 
fraction of total number of pixels in the image for which 
difference max(R, G, B) - min(R, G, B) is greater than some 
threshold value T. The threshold value can be set by the 
evaluator. This fraction is evaluated for both text and non- 
text regions on the image. This leads to two color saturation 
features. 

C. Image Spam Detection Using Hsv Histogram 

The HSV color space is fundamentally different from the 
widely known RGB color space since it separates out the 
intensity from the color information. The HSV stands for the 
Hue, Saturation, and Value based color model. The Value 
represents intensity of a color, which is decoupled from the 
color information in the represented image. 

A three dimensional representation of the HSV color space 
is a hexacone, in which the central vertical axis represents 
the intensity [21]. Hue defines the angle relative to the red 
axis. Similarly, saturation is the depth or purity of the color 
and is measured from the radical distance from the central 
axis with value between 0 at the center to 1 at the outer 
surface. 

The intention of the spammers is that the spam messages 
designed by them should be easily noticeable by the users. 
Hence, it is obvious that the spammers use highly 
contrasting colors to design their spam messages. This 
property is defined as “Conspicuousness” meaning 
“Obvious to the eye”. This histogram is converted into three 
bins and passed into neural networks and their epoch value 
is set to 300 and the goal in BPNN is set to 0.001. 

V. Experimental results 

In order to design and evaluate our spam detection 
algorithms, we used a collection of 5000 random images 
from spam archive dataset. Accuracy (A), Precision (P), and 
Recall (R), are some of the well-known performance 
measures. If the value of precision is high, it obviously 
indicates that the false negative is high. In other words, the 
detector has misclassified many spam messages as ham 
message. On the other hand, a high recall indicates that the 



false positive is high, i.e. many legitimate messages (ham) 
are misjudged as spam. We concern about the trade-off that 
exists between the spam and ham when we consider 
precision and recall values. 

These measures are defined below and used in this study. 



Accuracj(A) = 



TP+7N 



TP + m+FN+FP 



Pr ecisior(P) = 



TP 

TP-FP 



R zcall(R) = 

TP-FN 

TP is the number of e-mail that is spam and correctly 
predicted as spam; FP is the number of e-mail that is 
legitimate but predicted as spam; TN is the number of e- 
mail that is legitimate and is truly predicted as legitimate 
(ham); and FN the number of e-mail that is spam but 
predicted as legitimate. 



Table 2. Confusion Matrix 



Prediction 


Observer 


Legitimate 


Spam 


Legitimate 


TN 


FN 


Spam 


IP 


TP 



Table 3. Shows the comparison of the Accuracy (A), 
Precision (P), and Recall (R) for different approaches of 
spam detection. The approaches being spam detection based 
on file properties, RGB histogram, and HSV histogram. 





Accuracy (A) 


Precision (P) 


Recall (R) 


Approach 


Ham 


Spam 


Ham 


Spam 


Ham 


Spam 


File 


90.5 


86.6 


84.5 


80.6 


88.3 


85.7 


properties 


% 


% 


% 


% 


% 


% 


RGB 


94.6 


92.1 


88.7 


84.1 


90.5 


89.6 


histogram 


% 


% 


% 


% 


% 


% 


HSV 


96.5 


95.4 


90.5 


88.7 


92.0 


91.4 


histogram 


% 


% 


% 


% 


% 


% 


Combinatio 
n of RGB 


99.3 


99.1 


98.3 


95.5 


96.8 


95.9 


and HSV 
histogram 


% 


% 


% 


% 


% 


% 



Table 3. Comparison of Accuracy, Precision, Recall 



Based on spam detection with the help of file properties, the 
signal to noise ratio of the GIF Images and the JPEG images 
are tabulated below. The Table. 4 [18] evidently illustrate 
the Signal to Noise ratio for calculated for spam and ham 
messages that were of GIF format. 

In the similar way Table. 5 [18], explains the calculation of 
Signal to Noise ratio for detecting the image spam for JPEG 
format only. This demonstrates the signal to noise ratio for 
different features that were mentioned in Table. 1. Based on 
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the Signal to Noise ratio obtained for different features of an 
image it is possible to isolate spam message from the ham 
message. This minimizes the rate of false positive obtained. 



Feature 


S2N 


JLIspam 


JLlham 


CJspam 


Oil am 


fl 


0.188 


519.2 


257.0 


176.4 


1216.5 


f2 


0.143 


356.3 


165.1 


128.7 


1208.7 


f3 


0.043 


1.76 


53.8 


4.58 


1206.1 


f4 


0.100 


15269.6 


29347.1 


13459.1 


127587.5 


f5 


0.767 


195339.6 


42098.9 


107180.16 


92658.9 


f6 


0.524 


16.97 


5.00 


10.4 


12.50 



Table 4. Feature Quality (GIF Only) 



Feature 


S2N 


JLIspam 


JUham 


CJspam 


Oh am 


fl 


0.289 


422.08 


618.40 


133.16 


546.64 


f2 


0.308 


305.50 


496.66 


129.20 


491.59 


f3 


0.040 


2.05 


2.12 


2.005 


14.98 


f4 


0.272 


21601.06 


203686.40 


12787.30 


655880.90 


£5 


0.323 


127524.60 


539062.50 


71339.82 


1202866.95 


f6 


0.265 


6.70 


4.82 


3.90 


3.15 



Table 5. Feature Quality (JPEG Only) 

Fig. 2 shows the comparison chart of different histogram 
based approaches in determining the image. The comparison 
of precision and recall value is shown below. 

Comparison of Precision and Rec all 





RGB and HSV 




HSV 


■ 


-RGB 


— 4 — 


File Properties 




Recall <R) 



Fig. 2 Comparison of precision and recall values for 
different types of HS V and RGB histogram generation. 

VI. Conclusion 

This paper reveals a general study on Image spam, 
classification of image spam on the basis of text properties 
and content properties, and some of the methodologies in 
detecting the image spam. The detection of image spam 



using their file properties seems to be an effective method in 
detecting the spam. This method eliminates only 80 percent 
of the spam messages and this makes the method not 
suitable for most of the cases. The spam messages need 
further filtering after the file type detector to completely 
eliminate the spam e-mails. The second approach of image 
spam detection using histogram seems to be advanced 
method of the first described. Since, this method implements 
the distance measurement it seems to be more convenient in 
detecting spam than the former approach. The latter 
discussed HSV histogram method of image spam detection 
is the most advanced method in eliminating the spam 
messages. This method utilizes the color moments to 
determine the saturation level of the contrasted colors. This 
seems to be effective in spam detection. This method 
minimizes the low false positive rate to minimum. HSV 
histogram approach provides improved performance in 
detecting the image spam than the methods of spam 
detection using their file type and color histogram. HSV 
based histogram provides better performance at varying 
brightness and contrast settings. 
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Abstract- Fuzzy approach to control congestion in ATM 
networks is inevitable in research areas. . A control scheme 
that dynamically regulates traffic flow according to changing 
network conditions however requires the understanding of 
network dynamics. To minimize congestion, for a gradual 
change we proposed fuzzy approach. In our scheme, burst 
length as well as buffer occupancy are represented by 
triangular membership functions of fuzzy sets. However, these 
improvements are achieved at the cost of higher time 
complexity. 

I. Introduction 

A Major development in high-speed networking is the 
emergence of B-ISDN's and ATM. ATM has been 
designed to support various classes of multimedia traffic 
with different bit rates and QoS requirements. Due to the 
unpredictable fluctuations and burstiness of traffic flow 
within multimedia networks, congestion can occur 
frequently. Therefore, it is necessary to design appropriate 
congestion control mechanisms to ensure the promised QoS 
is met. Shift in the network’s performance bottleneck from 
channel transmission speed to propagation delay of the 
channel and the processing speed at the network switching 
nodes [1]. 

Consequently, congestion prevention can be interpreted as 
the problem of matching the admitted traffic to the network 
resources. This, in turn, could be viewed as a classical 
problem of feedback control i.e. matching the output to the 
input of dynamical systems [2]. In feedback controls, when 
possible traffic congestion is detected at any network 
element, feedback signals are sent back to all sources. ATM 
layer congestion control refers to the set of actions taken by 
the network to minimize the intensity, spread, and duration 
of congestion. Feedback flow control is one of the solutions 
which has been reported in the literature [3], [4], [5]. 

The growing success of fuzzy logic in various fields of 
applications, such as control, decision support, knowledge 
base systems, data base information retrieval and pattern 
recognition, is due to its inherent capacity to formalize 
control algorithms that can tolerate imprecision and 
uncertainty, emulating the cognitive processes that human 
beings use every day[6] , [7] , [8] . Fuzzy logic system have 
been successfully applied to deal with congestion control 
related problems in ATM networks and have provided a 
robust mathematical frame work for dealing with real world 
imprecision [9], [10]. The fuzzy approach exhibits a soft 
behavior, which means a greater ability to adapt itself to 
dynamic, imprecise, and bursty environments. Comparative 
studies [11] have shown that the fuzzy approaches 



significantly improve system performance compared with 
conventional approaches. 

In conventional schemes, a binary threshold divides the 
buffer space in two parts: below or equal to the threshold 
level, for every arriving cell is given entry to the network 
and above the threshold every cell is rejected. In fixed 
threshold case as described by Bonde et. al. [11], two states 
of buffer - block and admit can be replaced by fuzzy sets. 
We have proposed the use of fuzzy logic for dynamic feed- 
back threshold scheme. In applied fuzzy scheme, burst 
length as well as buffer occupancy are represented by 
triangular functions. 

ii. Fuzzy expert system 

Fuzzy logic provides a general concept for description and 
measurement. Unlike traditional Aristotelian two-valued 
logic, in fuzzy logic, fuzzy set membership occurs by degree 
over the range [0,1], which is represented by a membership 
function. The function can be linear or non-linear. 

A. From fuzzy Set to Fuzzy Events 

Fuzzy set theory, compared to other mathematical theories, 
is perhaps the most easily adaptable theory to practice. The 
main reason is that a fuzzy set has the property of relativity, 
variability, and inexactness in the definition of its elements. 
Instead of defining an entity in calculus by assuming that its 
role is exactly known, we can use fuzzy sets to define the 
same entity by allowing possible deviations and inexactness 
in its role. This representation suits well the uncertainties 
encountered in practical life, which make fuzzy sets a 
valuable mathematical tool. 

hi. Model of fuzzy controller 

Fuzzy systems are defined with a strong mathematical basis, 
which are rule-based systems. A fuzzy system is made of a 
fuzzifier, a defuzzifier, an inference engine, and a rule base 
as shown in Fig. 1. The role of the fuzzifier is to map the 
crisp input data value to fuzzy sets defined by their 
membership functions depending on the degree of 
“possibility” of the input data. The goal of the defuzzifier is 
to map the output fuzzy sets to a crisp output value. It 
combines the different fuzzy sets with different degrees of 
possibility to produce a single numerical value. 
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Fig. 1 : Model of Fuzzy Controller 

The fuzzy inference engine defines how the system should 
infer through the rules in the rules base to determine the 
output fuzzy sets. Ray-Guang Cheng et. al. developed a 
model of a fuzzy traffic controller in which inputs linguistic 
variables are chosen so that the controller is a closed-loop 
system with the stable and robust operation. 

The heart of a Fuzzy system is a rule base, which consists of 
a set of If-Then rules. The rules are statements in which 
some words are characterized by continuous membership 
functions. For example, IF the link is close to congestion 
THEN reduce the input rate, the words close to congestion 
are characterized by a membership function as shown in the 
Figs. 2 (a) and 2(b), where congestion is considered 
happening when the link utilization is above 0.8. 





( 1 » 

Fig. 2: A Typical Representation of Buffer Occupancy as 
well as Burst Length by Fuzzy Sets 

L, B, M, A and H represent Low, below medium, Medium, 
above medium and High membership sets respectively. 

M. V. represents membership values. 

The fuzzy system encodes expert knowledge about the 
system to be implemented rather than modeling the actual 
system; therefore it resembles a rule based expert system. 
However, unlike expert system fuzzy system does not fail 
when faced with a control situation in which no rule is 
defined. Instead, controls are inferred using the membership 
function to generate approximate control actions. 

iv. Applied fuzzy approach 

For applied scheme out-put buffer divided into various 
number of equal parts viz. two, three, and four for this 
purpose, then the feedback had applied after 50%, 33%, and 
25% completion of the buffer space i.e. when N=2, 3, and 4 



respectively. Depending upon which threshold has been 

crossed, the network gets a mild warning, or an 

ultimatum. A gradual change is more intuitive here; this has 
been incorporated with fuzzy logic. In applied fuzzy 
scheme, burst length as well as buffer occupancy are 
represented by triangular functions as shown in Fig. 2 The 
degree of membership of a particular set, associated with 
each valid buffer occupancy can be read from this figure. 
This quantification of membership is called fuzzification. 
From these membership values and corresponding sets, 
blocking to be offered, again in fuzzy terms can be find out. 
This process is called rule-based inference. As an example, a 
typical rule is when buffer occupancy is high and burst 
length is high, number of blocked cells is also high as shown 
in Lookup Tables 3.1(a), and 3.1(b). 
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Table 3.1 (a): Lookup Table 
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Blocking 

Table 3.1(b) : Lookup Table 

Then, by applying suitable defuzzification method, the 
percentage blocking to offered at that particular buffer 
occupancy level and at given burst length can be 
determined. For defuzzification, with the set such as shown 
in the Table 3.2, weighted Average is used. 



SET 
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0.05 


0.25 


0.50 


0.75 


0.95 


II 


0.05 


0.20 


0.40 


0.60 


0.80 



Table 3.2: Defuzzification Table 

L- Low Set, B: Below Medium Set, M: Medium Set, A: 
Above Medium Set, H: High Set 

A typical example is explained as follows: Let us assume 
that buffer occupancy as well as burst length both are 
characterized by the fuzzy set described in Fig. 2(a). Also, 
maximum buffer size is kept at 8 and maximum burst length 
is assumed to be 8. Suppose, at the time of the new arriving 
cell burst, buffer occupancy = 5 and arriving burst length = 
6. When normalized with respect to maximum value of 8, 
these variables are mapped as buffer occupancy =0.625 and 
burst 0.75. Using fuzzy set of Fig. 2(a) for fuzzification it is 
seen that, buffer occupancy is a member of set M with 
associated value 0.5 and a member of set A with associated 
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value 0.5. Burst length is a member of set A with associated 
value 1.0 and a member of set H with associated value 0.0. 
Using Table 3.1(a) and min-max method of evaluation, we 
get: 



Buffer occupancy M(0.5) 

A(L0)=>Blocking of A(0.5) 


and 


burst 


length 


Buffer occupancy M(0.5) 

H(0.0)=>Blocking of H(0.0) 


and 


burst 


length 


Buffer occupancy A(0.5) 

A(L0)=>Blocking of H(0.5) 


and 


burst 


length 


Buffer occupancy A (0.5) 


and 


burst 


length 



H(0.0)=>Blocking of H(0.0) 

Thus, taking maximum of the four values associated with H, 
blocking has membership of set A with value (0.5) and 
membership of set H with value (0.5). Using these sets with 
weighted average of membership values, percentage 
blocking offered can be found. For defuzzification, set 1 of 
Table 3.2 is used. 



B1 ockin 8 - [(0 - 5,<0 - 75) + (0 5 x 0 95)1 - 0.85 
(0.5 +0.5) 

Thus the percentage blocking to be offered, as per the 
proposed scheme is 85%. Based on this method of 
determining percentage blocking for the incoming cells, an 
ATM node is simulated, and performance of the scheme has 
been compared with static and dynamic feed-back schemes. 



V. Results 



The simulation results are shown in the Tables 1-6 and 
Graphs 1-6 indicated that the over all performance of the 
ATM switch improved when we applied fuzzy logic to 
Dynamic Feed-back Threshold scheme. In this work, link 
bandwidth is taken as 155.5 Mbps. So minimum delay 
suffered by a cell is 2.827 ps. Each input VBR source i, i=l, 
2, , N is modeled by two state 

ON-OFF Interrupted Bernoulli Process (IBP). We first 
considered switch of size 10x10, with input length = 4 and 
output length = 8. We had applied a constant threshold 
(C.Th.) = 4, the size of the output buffer (Bop) is kept 10. 
Out-put buffer had divided into equal number of parts viz. 
two, three, and four. The feedback had applied after 50%, 
33%, and 25% of the buffer space gets filled i.e. when N =1, 
and 2 respectively under Dynamic Threshold Feed-back 
(D.Th.Fb.) scheme. Simulation results are taken for these 
values of N, after applying different load conditions. A 
gradual change is more intuitive here, this has been done 
with fuzzy logic in our proposed Fuzzy Feed-back (F.Fb.) 
scheme. 

The results are obtained for the three important performance 
indices i.e. throughput, average cell delay and cell loss 
probability Vs load. The performance of the new proposed 
scheme has been compared with Constant Threshold and 
Dynamic feed-back Threshold based schemes. From the 
results we observed that all the QoS parameters as described 
above are the function of offered load and number of 
buffer parts(N), but for Constant Threshold Scheme these 
parameters don’t depend on the value of N. 

For low loads (L < 0.5) all the schemes provide about the 
same throughput (100% - 99%). Which shows that all the 



incoming cells are served by the switch, so we will limit our 
discussion to higher loads. For moderate loads (0.5 < L < 
0.7), due to rigidness of the Constant Threshold the 
throughput decreases from 99% to 97%, but the remaining 
schemes again have same results. The reason is that since 
after the completion of every 50%, 33% and 25% of the 
buffer space the network gets a proper signal to control the 
incoming burst of cells. At higher loads (0.7 < L < 0.9), the 
throughput decreases up-to 96% for Constant Threshold 
Scheme. At these load conditions the value of throughput 
increases gradually for Dynamic Feed-back Scheme with 
respect to N, while remains constant for proposed Fuzzy 
Scheme. 

The value of average cell delay for Constant Threshold 
Scheme increases rapidly as offered load changes from 
lower to higher. This parameter is again doesn’t depend 
upon value of N. But the value of average cell delay is very 
low for Dynamic Feedback Threshold Scheme. For 
proposed Fuzzy Scheme it is minimum. 

Like average cell delay, the results show that Proposed 
Fuzzy Scheme has minimum value of average cell loss 
probability too. The reason can be explained as follows: 

The threshold function determines, for each cell-burst, how 
many of the arriving cells to admit into the buffer. This 
function bears significant influence on the performance of 
the network including the fraction of cells lost due to 
dropping or excessive delays and the delay distribution of 
the cells. The traditional ‘fixed’ scheme utilizes a binary 
threshold: admit or no-admit, depending on the occupancy 
of the buffer. In the proposed Fuzzy Scheme, blocking 
decision is based on triangular membership function. 
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Table-2: Ave. Cell Delay Vs Load When N=1 
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Table 4: Ave. Throughput Vs LoadVVlien N=2 
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vi. Discussion & conclusion 



As shown in the previous section 2.4, the new fuzzy logic 
based scheme performs very well in comparison to the fixed 
threshold scheme. However, the price paid, is in terms of 
increased time complexity. The fixed threshold case has a 
time complexity of the order of 0(1) because it has to be 
decided only once if incoming burst can be accommodated 
or not. In this Section an analysis is carried out to determine 
worst case time complexity of the suggested scheme. 

Step A - Fuzzification 

If simple space of buffer occupancy is represented by n sets 
and if burst length sample space is represented by in Sets, in 
the worst case, this is bounded by time complexity of 0(n) 
and 0(m) respectively 

Step B - Look-up Table 

At the and of fuzzification process, member functions along 
with membership values obtained. The set of membership 
functions can be at the most m in case of burst length and at 
most n in the case of buffer occupancy. Now, for each 
membership function of buffer occupancy, every 
membership function of burst length is taken and the look- 
up table is referred and corresponding entry is noted down. 
Whole of this process takes constant time. As this process is 
referred for a total m.n of times, the time complexity of this 
step is 0 (m.n). 
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Step C - Inference Engine 

Assuming that, these m.n data elements are clustered in m 
groups, each group having n data elements, it can be easily 
seen that finding out the minimum element in n data 
elements is of time complexity 0(n). As this procedure is 
repeated for m times, the total time complexity of finding 
out minimum is 0(m.n). It can be safely assumed that de- 
fuzzification sample space is represented by not more than I 
element where m.n. This assumption is valid as at the most 
m.n look-up table entries are referred only. So finding out 
maximum for these entries takes time of the order of 0 (m.n) 
again. Total time complexity of this step is thus 0(m.n) +0 
(m.n). 

Step D - Defuzzification 

Here, the fuzzy sets passed on along with the membership 
values are defuzzified to determine the crisp output value. 
Without losing generality, it can be assumed n>m. In this 
case, we can say that at the most m.n membership functions 
along with the values are passed on from step 3. For each 
function, respective de- fuzzification value is retried and 
multiplying with associated weight tubes a constant time. 
So worst case time complexity is 0 {m.n). Addition of all the 
elements after this has time complexity of 0 (m.n) and 
divided by addition of respective weight (complexity of 
0 (m.n) again). Division takes constant time out of time. 

So the total time complexity of this step (O(m.n) +0(m.n) 
+ 0 (m.n)). 

Total Time Complexity 

Total time complexity of fuzzy logic is obtained by adding 
the individual time complexities as follows: 

O(m) + 0 (n) + 0 (m.n) + 0 (m.n) + 0 (m.n) + 0 (m.n) + 0 (m.n) 
+ 0 (m.n) 

Which by using the result from , turns out to be of time 
complexity of O(m.n). For n>=m the time complexity of the 
new scheme can be expressed as O(n'). 

Thus new scheme can be easily implemented for small value 
of n. For large n time complexity may become a liability for 
the new scheme. 

In this paper, we have introduced fuzzy approach to control 
congestion in ATM networks. When a number of bursty 
traffic sources add cells, the network is inevitably subject to 
congestion. Various traditional approaches to congestion 
management reported in the literature, utilize ‘fixed’ 
threshold, i.e., either binary or a limited number of 
predetermined values based on the cell priorities, to 
determine when to permit or refuse entry of cells into the 
buffer. The aim is to achieve a desired tradeoff between the 
number of cells carried through the network, propagation 
delay of the cells, and the number of discarded cells. 
Conventional thresholds suffer from some fundamental 
limitations. One of the limitations is the difficulty of 
obtaining complete statistics on input traffic to a network. 
As a result, it is not easy to accurately determine the 
equivalent capacity or effective thresholds for multimedia 
high-speed networks in various bursty traffic flow 
conditions. Besides, these approaches/schemes provide 
optimal solutions only under a steady state. From these 



membership values and corresponding sets, blocking to be 
offered, again in fuzzy terms can be find out. Then, by 
applying suitable defuzzification method, the percentage 
blocking to offered at that particular buffer occupancy level 
and at given burst length can be determined. A comparative 
study has revealed the proposed scheme is able to achieve 
lower average delay and higher throughput than the constant 
as well as dynamic feed-back threshold schemes and that too 
with lower cell loss probability. 
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Abstract- Routing is the act of moving information across a 
network from a source to a destination. Along the way, at least 
one intermediate node typically is encountered. Routing is 
often contrasted with bridging, which might seem to 
accomplish precisely the same thing to the casual observer. 
Routing involves two basic activities: determining optimal 
routing paths and transporting information groups through a 
network. Routing also refers to path finding between source 
and destination. This literature review investigates some of the 
gateways to path finding in different networks that are listed in 
present research literature. A selected set of different 
approaches are highlighted and set in a broader context, 
illustrating the various aspects of path finding in different 
networks. Because path finding is applicable to many kinds of 
networks, such as roads, utilities, water, electricity, 
telecommunications and computer networks alike, the total 
number of algorithms that have been developed over the years 
is immense. The aim of this survey is to compromise a selected 
cross-section of approaches towards path finding and the 
related fields of research, such as transportation GIS, network 
analysis, operations research, artificial intelligence and 
robotics, to mention just a few examples where path finding 
theories are employed. 

Keywords- Routing, Shortest path algorithms, ITS, KSP. 

I. Introduction 

T his paper projects about various shortest path 
algorithms of routing in transportation networks. 
Routing algorithm is the key element in any networks 
performance, and thus it can be seen as the brain of 
the network. “Was it possible to find a path through the city 
crossing each of its seven bridges once and only once and 
then returning to the origin?” - This was Euler’s famous 
“Konigsberg bridge” question, dating back as far as 1736. It 
is often seen as the starting point of modern path finding. 
The basis of what is now known as graph theory was formed 
by Euler’s methods and this theory in turn paved the way for 
path finding algorithms. In long-distance road travelling, 
where successful route planning, prior to travelling and en- 
route is essential to finding the optimal path from origin to 
destination. “Optimal” refers to shortest time, shortest 
distance, or least total cost, the latter being of major concern 
in some parts of the country, where travelling by car may 
mean many costly ferry crossings and expensive to all roads 
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roads in order to get from one’s departure to one’s arrival. 
Path finding in a fixed static network, set costs for traversing 
the network, and path finding in a dynamic network, the cost 
of traversing the network varies over the time of traversing. 
Because path finding is applicable to many kinds of 
networks, such as roads, utilities, water, electricity, 
telecommunications and computer networks alike, the total 
number of algorithms that have been developed over the 
years is immense. The aim of this survey is to compromise a 
selected cross-section of approaches towards path finding 
and the related fields of research, such as transportation GIS, 
network analysis, operations research, artificial intelligence, 
and robotics, to mention just a few examples where path 
finding theories are employed. Road networks are the 
backbone of modern society. Consequently, the reliability of 
this road network is thus a decisive factor not only in terms 
of market outreach and competition, but also in terms of 
continuity, to ensure a 24/7 operation of the community we 
live in. Any threat to the reliability of the road network 
constitutes a vulnerable spot, a weakness, that need to 
addressed in order for the network not to fail, given the right 
(in fact: “wrong”) circumstances. This is of particular 
concern when considering sparse, rural networks, because 
what by urban standards is a minor degradation (i.e. car 
accident, resulting in queuing, delays and diversions) may 
have severe consequences if occurring in a rural setting (i.e. 
blocking the only access road for hours, even days or 
weeks). One hazard to transportation networks that has 
emerged recently and what may become an increasing 
concern in the near future are the effects global climate 
change, with extreme weather and precipitation patterns not 
seen before, and thus closing or degrading links that were 
thought invulnerable to such threats (Askildsen, 2004). 

ii. Intelligent transport systems 

In paper, the recent decade’s road transportation systems 
have undergone considerable increase in complexity and 
congestion proclivity. From a user point of view, what 
matters most in relation to a road network is the following: 
Can I, at the desired time of departure, get from A to B by 
using the intended route and means of transport, and arrive 
at a desired time, which would be the best case. Or, does 
there exist no route or means of travel at all that can take me 
from A to B at the desired time of departure, let alone within 
arriving at the desired time, which to the user would be the 
worst case. This gave rise to the field of ITS, Intelligent 
Transport Systems, with the goal to apply and merge 
advanced technology to make transportation more safe and 
efficient, with less congestion, pollution and environmental 
impact. In working towards this goal, ITS can take many 
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different forms. Vehicle location and navigation systems are 
one of these forms and have come along with the emerging 
field of transport telematics. Transport telemetric implies the 
large-scale integration and implementation of 
telecommunication and informatics technology in the field 
of transportation, penetrating all areas and modes of 
transport, the vehicles, the infrastructure, the organization, 
and management of transport. 

Zhao (1997) distinguishes between route planning and route 
guidance as two key elements in vehicle location and 
navigation systems as part of ITS. Route planning is the 
process that helps vehicle drivers plan a route prior to 
driving a specific part of his or her journey. Route guidance 
is the real-time process of guiding the driver along the route 
generated by a route planner. 

Huang et al. (1995) discriminates route guidance even 
further, distinguishing between centralized and 
decentralized route guidance. In the former, vehicles 
conduct their own path finding using on-board computers 
and static road maps in CD-ROMs, and applying heuristic 
search algorithms. Centralized route guidance relies on 
traffic management centers (TMC) to answer path queries 
submitted by vehicles linked to it. In this case, Huang et al. 
(1995) describe a central server holding a materialized view 
of all shortest paths at that given time, accessed by lookup 
requests from the vehicles equipped with this system. 
Although not explicitly stated, it can be assumed that this 
also is the case in the Advanced Traveler Information 
System (ATIS) detailed by Shekar and Fetterer (1996) or the 
ADVANCE project portrayed by both Revels (1998) and 
Zhao (1997). Boyce et al. (1997) provide a detailed 
evaluation study of the ADVANCE project for further 
reference. 

III. SHORTEST PATH ALGORITHMS 

Efficient management of networks requires that the shortest 
route from one point (node) to another is known; this is 
termed as the shortest path. It is often necessary to be able to 
determine alternative routes through the network, in case 
any part of the shortest path is damaged or busy. The 
analysis of transportation networks is one of many 
application areas in which the computation of shortest paths 
is one of the most fundamental problems. These have been 
the subject of extensive research for many years. The 
shortest path problem was one of the first network problems 
studied in terms of operations research. Fixed two specific 
nodes s and t in the network, the goal is to find a minimum 
cost way to go from s to t. Several algorithms for computing 
the shortest path between two nodes of a graph are known. 
This one is due to Dijkstra (1959). Each node is labeled with 
its distance from the source node along the best-known path. 
Initially, no paths are known, so all nodes are labeled with 
infinity. As the algorithm proceeds and paths are found, the 
labels may change, reflecting better paths. A label may be 
tentative or permanent. Initially, all labels are tentative. 
When it is discovered that a label represents the shortest 
possible path from the source to node, it is made permanent 
and never changed thereafter. A network consists of arcs, or 



links, and nodes. The fastest path is calculated as a function 
associated with the cost of travelling the link. Even though 
the different research literature tends to group the types of 
shortest paths problems slightly different, one can discern, 
in general, between paths that are calculated as one-to-one, 
one-to-some, one-to-all, all-to-one, or all-to-all shortest 
paths. In software packages solving static network shortest 
path problems the software usually aggregates a once-off 
all-to-all calculation for all nodes, from which subsequent 
routes then are derived. Clearly, this approach is not feasible 
for dynamic networks, where the travel cost is time- 
dependent or randomly varying. However, the majority of 
published research on shortest paths algorithms has dealt 
with static networks that have fixed topology and fixed 
costs. A few early attempts on dynamic approaches, 
referenced by Chabini (1997), are Cooke and Halsey (1966) 
and Dreyfus (1979). Not more than a decade ago, Van Eck 
(1990) reports several hours as an average time for a 
computer to churn through an all-to-all calculation on a 250- 
nodes small-scale static network, and several days on a 
16.000-nodes large-scale network. 

One way of dealing with dynamic networks is splitting 
continuous time into discrete time intervals with fixed travel 
costs, as noted by Chabini (1997). Thus, understanding 
shortest path algorithms in static networks becomes 
fundamental to working with dynamic networks. 

A. Shortest Path In Static Networks 

Several algorithms and data structures for algorithms have 
been put forward since the classic shortest path algorithm by 
Dijkstra (1959). In its modified version, this algorithm 
computes a one-to-all path in all directions from the origin 
node and terminates when the destination has been reached. 
Deonardo and Fox (1979) introduce a new data structure of 
reaching, pruning, and buckets. The original Dijkstra 
algorithm explores an unnecessary large search area, which 
led to the development of heuristic searches, among them 
the A* algorithm, that searches in the direction of the 
destination node. This avoids considering directions with 
non- favorable results and reduces computation time. 

A significant improvement is seen in the bi-directional 
search, computing a path from both origin and destination, 
and ideally meeting at the middle. In relation to this search 
technique, it should be remarked that Jacob et al. (1998) 
discard bi-directional algorithms as impractical in their 
computational study of routing algorithms for realistic 
transportation networks. 

Zhan and Noon (1996) had a comprehensive study of 
shortest path algorithms on 21 real road networks from 10 
different states in the U.S., with networks ranging from 
1600/500 to 93000/264000 nodes/arcs. In this study, 
Dijkstra-based algorithms, however differing in data 
structure, outperform other algorithms in one-to-one or one- 
to-all fastest path problems. 

In summary, the A* algorithm, along with Dijkstra-based 
algorithms, are preferred in most of the literature researched 
by the author the author. It is in fact noteworthy that the 
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Dijkstra algorithm has prevailed to the present date, proving 
its universal validity. 

B. K-Shortest Path In Dynamic Networks 

It paper is a result of the recent advances in computer and 
communications technology, together with the 
developments in ITS, that have flared a renewed interest in 
dynamic networks. This interest in the concept of dynamic 
management of transportation has also brought forward a set 
of algorithms that are particularly aimed at optimizing the 
run-time of computations on large-scale networks. 

Chabini (1998) lists the following types of dynamic shortest 
path problems depending on (a) fastest versus minimum cost 
(or shortest) path problems; (b) discrete versus continuous 
representation of time; (c) first-in-first-out (FIFO) networks 
versus non-FIFO networks, in which a vehicle departing at a 
later time than a previous vehicle can arrive at the 
destination before the pervious vehicle; (d) waiting is 
allowed at nodes versus waiting is not allowed; (e) questions 
asked: one-to-all for a given departure time or all departure 
times, and all-to-one for all departure times; and (f) integer 
versus real valued link travel costs. 

Fu and Rilett (1996) investigate what they call the dynamic 
and stochastic shortest path problem by modeling link travel 
times as a continuous-time stochastic process. The aim of 
their research was to estimate travel time for a particular 
path over a given time period. They deviate from the 
mainstream appraisal of the A* algorithm and advocate the 
k-shortest path. The reason for this is that standard shortest 
path algorithms may fail to find the minimum expected 
paths, particularly when dealing with non-linear 
optimization, as is the case in developing travel time 
estimation models. However, in lieu of real data, their 
research is based on a hypothetical change pattern in travel 
time. 

Based on the research of path finding algorithms, in static 
networks, Chabini (1997) remarks that a time-space 
expansion representation can be used in dynamic networks, 
applying discrete time intervals with fixed costs. Hence, 
depending on how time is treated, dynamic shortest path 
problems can be subdivided into two types: discrete and 
continuous. In the discrete case, if using 15-second time 
intervals, a full 24-hour implementation would involve 
calculations on 5760 time discretization, multiplied with the 
number of nodes and links. Chabini (1997) makes a distinct 
separation between fastest time paths, in which the cost of a 
link is the travel time of that link, and minimum cost paths, 
in which link costs can be of a general form. The difference 
between these two is nonetheless not explored until Chabini 
(1998). 

Chabini (1997) identifies two key questions in dynamic path 
finding: (1) what are the fastest paths from one origin to all 
destinations departing at a given time, and (2) what are the 
fastest paths from all nodes to one destination for all 
departure times. He sees the latter as the most significant in 
relation to ITS, which is true, if one assumes that ITS aims 
at finding the best path for multiple vehicles with the same 
destination. In Chabini (1998) the focus extends slightly. 
Now three questions are put forward: (1) one-to-all fastest 



path at a given time interval, (2) all-to-one fastest path for 
all departure times and (3) all-to-one minimum cost path for 
all departure time intervals. 

Chabini (1997, 1998) places emphasis on the all-to-one 
minimum cost path as the key algorithm with relation to 
ITS, the reason being that only a limited set of all network 
nodes are destination nodes in realistic road networks, while 
there is a considerably larger number of nodes that will be 
origin nodes. (Moving vehicles tend more to converge to the 
same goal than to spread in all directions) 

Horn (1999) continues along the research trails of Chabini 
(1997) and Fu and Rilett (1996), but uses a less detailed 
articulation of travel dynamics, reflecting as he puts it, the 
recognition that information about network conditions in 
most parts of the world are most likely to be sparse and that 
merely estimates of average speed on individual network 
links are available in most cases. With the presumption that 
these estimates allow for variation in speed, congestion and 
delays at nodes, he studied a number of Dijkstra variant 
algorithms that address these particular conditions. Most 
important, he propounds an algorithm that calculates an 
approximation of shortest time path travel duration (path 
travel time), independent of the particular navigation 
between nodes. For an experienced vehicle driver, estimated 
travel time may be more important than the exact route that 
is to follow. This is a noteworthy addition to the fastest path 
algorithms in dynamic networks. 

C. K-Shortest Path 

The shortest path through a network is the least cost route 
from a given node to another given node and this path will 
usually be the preferred route between those two nodes. 
When the shortest path between two nodes is not available 
for some reason, it is necessary to determine the second 
shortest path. If this too is not available, a third path may be 
needed. The series of paths thus derived are known 
collectively as the k-shortest paths(KSP) and represent the 
first, second, third,. . ., k th paths typically of least length from 
one node to another. The k-shortest path problem is a variant 
of the shortest path problem, where one intends to determine 
k paths pi,. . ., p k (in order), between two fixed nodes. The k- 
shortest paths represent an ordered list of the alternative 
routes available. 

In obtaining the KSPs, it is normally necessary to determine 
independently the shortest path (k=l) between the two given 
nodes before computation of the remaining k-1 shortest 
paths can be carried out. The term shortest does not just 
apply to the distance between two nodes, but can involve 
any single component made up of one or more factors, 
including cost, safety or time, that put a weighting on the 
route. KSP algorithms are thus widely used in the fields of 
telecommunications, operations research, computer science 
and transportation science. 

A. W. Brander and M. C. Sinclair made a comparative study 
of k-shortest path algorithms. Four algorithms were selected 
for more detailed study from over seventy papers written on 
this subject since the 1950’s. The network was represented 
as a graph G = (V, E) where V is a finite set of n nodes or 
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vertices V(G) and E is a finite set of m edges (i.e. links or 
arcs) E(G) that connect the nodes. The work presented was 
driven by the desire to find a faster algorithm to calculate 
the KSPs between nodes in a network. The two original 
algorithms: Yen and Lawler were implemented to provide a 
reference to the expected speed and improvement available. 
Katoh was included as it represented a comparatively recent 
update and modification to Yen. The fourth algorithm 
Hoffman was implemented after further study as it was felt 
that it had the potential to outperform the other algorithms. 
Based on solving the k- shortest path problem, Jose L. Santos 
focused on three codes of Removing path algorithm, 
Deviation path algorithm-first version and Deviation path 
algorithm-second version were described and compared on 
rand and grid networks using random generators. Codes 
were also tested on the USA road networks. One million 
paths were ranked in less than 3 seconds on random 
instances with 10,000 seconds for real-world instances. 
Dreyfus and Yen cite several additional papers on this 
subject going back as far as 1957. 

Shi-Wei LEE and Cheng-Shong WU proposed an algorithm 
for finding the k-best paths connecting a pair of nodes in a 
graph G. Graph extension is used to transfer the k-best paths 
problem to a problem which deploys well-known maximum 
flow (MaxFlow) and minimum cost network flow (MCNF) 
algorithms. Two kinds of path finding procedures are often 
needed in the design of reliable communication networks. 
The first one is to find k shortest paths between a pair of 
nodes. Those paths may be simple or allow loops. 

For the k-shortest simple paths problem, Lawler proposed 
the best known algorithm in computation order 
0(k(m+nlogn) in undirected graphs, where n and m are the 
no of nodes and links of the input network. For the directed 
counterpart, Katoh et al. gave the best known bound in 0(k 
n(m+nlogn)). Recently Eppstein developed an efficient KBP 
algorithm for finding the k shortest paths allowing loops in 
0(m+nlogn+k), for highly reliable communication network. 
The solution output by KBP is a real optimal solution for k 
disjoint paths and it is very useful for planning highly 
reliable communication networks. 

Francesca Guerriero, Roberto Musmanno, Valerio 
Lacagnina and Antonio Pecorella dealt with the problem of 
finding the k shortest paths from a single origin node to all 
other nodes of a directed graph. The data structure used is 
characterized by a set of k lists of candidate nodes, and the 
proposed methods differ in the strategy used to select the 
node to be extracted at each iteration. 

IV. Conclusion 

Evaluation of any heuristic method is subject to the 
comparison of a number of criteria that relate to various 
aspects of algorithm performance. Examples of such criteria 
are running time, quality of solution, ease of 
implementation, robustness, and flexibility (Barr et al., 
1995; Cordeau et al., 2002). Since heuristic methods are 



ultimately designed to solve real world problems, flexibility 
is an important consideration. An algorithm should be able 
to easily handle changes in the model, the constraints and 
the objective function. As for robustness, should not overly 
be sensitive to differences in problem characteristics: a 
robust heuristic should not perform poorly on any instance. 
Moreover, an algorithm should be able to produce good 
solutions every time it is applied to a given instance. This is 
to be highlighted since any heuristics are non-deterministic, 
and contain some random components such as randomly 
chosen parameter values. The output of separate executions 
of these non-deterministic methods on the same problem is 
in practice never the same. This makes it difficult to analyze 
and compare results. Using only the best results of a non- 
deterministic heuristic, as is often done in the literature, may 
create a false picture of its real performance. So based on 
the heuristics we would like to do further research work on 
public transport travel using K-Shortest path algorithm 
(based on Dijkstra’s algorithm), considering user 
preferences. 
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Abstract- The HierarchyMap describes a novel approach for 
Treemap Visualization method for representing large volume 
of hierarchical information on a 2-dimensional space. 
HierarchyMap algorithm is a new ordered treemap algorithm. 
Results of the implementation of HierarchyMap treemap 
algorithm show that it is capable of representing several 
thousands of hierarchical data on 2-dimensional space on a 
computer and Portable Device Application (PDA) screens 
while still maintaining the qualities found in existing treemap 
algorithms such as readability, low aspect ratio, reduced run 
time, and reduced number of thin rectangles. The 
HierarchyMap treemap algorithm is implemented in Java 
programming language and tested with dataset of 
Departmental and Faculty systems of Universities, Family 
trees, Plant and Animal taxonomy structures. 

Keywords- Treemaps, Aspect ratio, HierarchyMap, 

Hierarchical data, Tree-like structure, Node. 

I. INTRODUCTION 

L arge volume of data we use today are represented in 
hierarchical structures, such structures in their natural 
forms includes information about Corporate Organizations, 
University/Departmental Structures, Family trees, Manuals 
Directory, Internet Addressing, Library Cataloging, 
Computer Programs, Animal and Plant Taxonomy, e.t.c. 
The contents and organization of these structures are easily 
understood if they are small, but very difficult to understand 
when the structures become large (Mark Bruls, et al.,2000). 
These problems lead to the concept of Treemaps 
(Shneiderman and Johnson, 1991). Treemap describes the 
notion of turning a tree into a planar space-filling map. It is 
described as space-filling visualization method capable of 
representing large hierarchical collections of quantitative 
data. A treemap works by dividing the display area into a 
nested sequence of rectangles whose areas correspond to an 
attribute of the dataset, effectively combining aspects of a 
Venn diagram and a pie chart (Shneiderman et al., 2002). 
With Treemaps, large hierarchical structures can be viewed 
without any difficulty because the Treemap visualization 
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method maps hierarchical information into a rectangular 2- 
dimensional display in a space-filling manner such that 
100% of the designated display space is utilized. Interactive 
control allows users to specify the presentation of both 
structural (depth bounds, etc.) and content (display 
properties) information (Shneiderman, 1992). This is in 
contrast to traditional static methods of displaying 
hierarchically structured information, which generally 
makes either poor use of display space or hide vast 
quantities of information from users. With the Treemap 
method, sections of the hierarchy containing more important 
Information can be allocated more display space while 
ortions of the hierarchy, which are less important to the 
specific task, can be allocated more space. Although 
treemaps are originally designed to visualize files on a hard 
drive (Shneiderman, 1992), it has been applied to a wide 
variety of areas ranging from financial analysis, business 
intelligence, money market, stock portfolio to sports 
reporting ( Wattenberg, 1999). A key ingredient of a 
treemap is the algorithm used to create the nested rectangles 
that make up the map. These set of rectangles are referred to 
as the layout of the treemap. 

In this work, we developed and implemented a novel 
HierarchyMap Algorithm. The idea behind this algorithm is 
to layout information from an hierarchy structures on nested 
rectangles which we called HierarchyMap Treemap. With 
this algorithm, every attribute in a hierarchical structure is 
represented by a rectangular node on the treemap. Each 
rectangle on the treemap corresponds to an attribute of the 
dataset. Each of these nodes representing the main attributes 
of tree-like structures is made to generate the information of 
sub-nodes of a lower level of the hierarchical structures. 
This process would continue until all the information in the 
different levels of the tree hierarchy are displayed one after 
the other on the same 2-dimensional screen. 

ii. Related works 

There are various methods that have been applied to display 
structure of information, and one of these techniques is the 
traditional tree diagram where elements are shown as nodes 
and relations are shown as links from parent to child nodes. 
More improved techniques have been presented to enhance 
the efficiency and qualities of such diagram both in 2- 
dimensional and 3 -dimensional space (Furnas , 1986), 
Knuth, 1973), (Bruggemenn, 1989), and (Card et al.,1991). 
These techniques have been found to be effective for small 
trees, but generally ineffective when more than hundreds 
elements have to be visualized simultaneously. The major 
reason for this limitation is that node and link diagrams use 
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the display space inefficiently as depicted in the Figure 1 
below: 




Fig. 1: Tree diagram for representing Fherarchical Data 
Structure (Mark Bruls et al., 2000) 




Fig. 2: TreeMap representing the Hierarchical Data 
Structure in fig. 1 (Mark Bruls et al., 2000) 



A treemap as shown in Figure 2 above was developed and 
introduced to solve the problem of this space usage by using 
the full display space to visualize the contents of the tree 
(Johnson and Shneidermann, 1991), (B. Shnerdermann, 
1992). As illustrated in Figure 2 above, Slice and Dice 
treemap algorithm splits the display rectangles along 
horizontal and vertical lines while recursively traversing a 
hierarchically structured dataset in top-down direction 
(Shneiderman,1992). Slice- and -Dice treemap are very 
effective when size is the most important feature to be 
displayed. However, this method also has the problem of 
creating layouts that contain many rectangles with a high 
aspect ratio. Therefore, many other treemap layout 
algorithms have been proposed. In order to overcome this 
limitations. These include Cluster and Squarified treemap 
algorithms, 

Cluster treemap uses a simple recursive algorithm that 
reduces overall aspect ratios (Wattenberg, 1999), while 
Squarified treemap algorithm presented the layout of the 
children in one rectangle as a recursive procedure squarify 
(Bruls et al., 2000). This procedure lays-out the rectangles in 
horizontal and vertical rows. When a rectangle is processed, 
a decision is made between two alternatives, either the 
rectangle is added to the current row, or the current row is 
fixed and a new row is started in the remaining sub- 
rectangle. This decision depends only on whether adding a 
rectangle to the row will improve the layout of the current 
row or not. 

These methods also have their drawbacks; changes in the 
data set can cause dramatic discontinuous changes in the 
layout produced by both cluster treemaps and squarified 
treemaps. This rapid layout changes also cause an 



unattractive flickering that draws attention away from other 
aspects of the visualization and makes it hard to find items 
on the treemap. Another problem with Cluster and 
Squarified treemap is that, its layouts fail to preserve order 
of information as it is done with slice and dice treemap. 
Many ordered treemap algorithms were introduced to 
address the limitations in slice-and-dice, Cluster, and 
Squarified treemap algorithms. The motivating factor here is 
to seek for the creation of layout in which items that are next 
to each other in a given order are adjacent in the treemap. 
Ordered treemaps include Pivot by Split Size, Pivot by 
Middle, Split and Strip treemap algorithm. These ordered 
treemaps generally change relatively smoothly under 
dynamic updates and roughly preserve order, produce 
rectangles with low aspect ratios compared to that of cluster 
and squarified treemap (Shneiderman et al. 2002). 

Pivot- by- middle algorithm selects the pivot to the middle 
item of the list so as to create a balanced layout. With this 
idea, this algorithm is not sensitive to changes as Pivot -by- 
Split Size. The pivot is taken to be the item (rectangle) with 
the largest area. Pivot -by- Split- size selects the pivot that 
will split the list into approximately equal total areas. These 
two algorithms create layouts that roughly preserve order 
and are relatively efficient, but fail to produce layouts with 
relatively low aspect ratio. 

Strip algorithm is a modification of the Squarified treemap 
algorithm. It works by processing input rectangles in order, 
laying them out in horizontal or vertical strips of varying 
thickness. It is efficient in that it produces a layout with 
better readability than the basic ordered treemap algorithm, 
and reasonable aspect ratios and stability (Shneiderman et 
al. 2002). 

III. Methods 

A. Development of HierarchyMap Algorithm 

The algorithm for the HierarchyMap treemap is as follows: 

Infotree(treedata nodes) T={tl,t2 ,t3, , tn} and a 2- 

D space divided into four equal rectangles. 

i. If the number of hierarchical items to be displayed 
is zero (i.e. T=0) , then no display. 

ii. If the number of hierarchical items to be displayed 
is 1 (i.e T=l), then Set 2-D space to the item. 

iii. If the number of items is greater than 1, split the 
rectangular 2-D space into four equal sizes and 
recursively divides each of the resultant item into 
fours until all items in the list are exhausted such 
that V ti E Tl, V tj E T2, V tK 

ET3, V tn E Tn : ti < ti+1 < tj < 

tj+1 < tk < tk+1 < . . . tn < tn+1. 

iv. An attribute of each hierarchical item corresponds 

to an area of each of the nested rectangles is 
defined as area( R) in such a manner that their areas 
correspond to the size of the elements of Tl, T2 
T3, and T4 where area (Rl) ~ area (R2) ~ area (R3) 
~ area (Rn). 

The algorithm accepts inputs data in hierarchical form. 
These input items in their hierarchical order are stored, read 
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and lay-out on nested rectangles which make up a treemap 
on the computer screen. The entire 2-dimensional computer 
screen is divided first into four equal parts, each of the 
successive parts is then repeatedly divided into four parts in 
such a way that the resultant rectangles are grouped 
according to the nodes level to be represented in the entire 
hierarchical data. This is to ensure that the order of the items 
to be displayed is maintained. These items are then linked to 
each of the resultant rectangles that make up the treemap. 
Each rectangle that represents the node level of tree data can 
then clicked repeatedly to display the sub-node elements. 
Every other nodal rectangle on the treemap could be clicked 
to display their own sub-node elements in a similar manner. 
In this process, several thousands of items of information 
could be displayed and viewed in a single space of 2- 
dimensional treemap. 

iv. Results and discussion 

HierarchyMap algorithm is tested with a several number of 
sample data of the information structures such as University 



system, Family system, and Animal Taxonomy. The results 
of this implementation are represented in Figures 3,4 and 5 
respectively. Figure 3 shows the treemap appearance with 
no information, Figure 4 shows the treemap representation 
of ten different families Structure and the adjustment of each 
of the rectangles to reduce their aspect ratio, improve their 
readability, reduction of thin rectangles . Finally, Figure 5 
shows the HierarchyMap for the combination of several tree 
structures capable of displaying thousands of information. It 
also shows the adjustment change of the rectangles to 
demonstrate its optimum measures of the three treemap 
metrics (i.e. aspect ratio, readability, ordering and capability 
for change) as data is updated. 

The results of this implementation also shows that this 
HierarchyMap algorithm is similar to other existing 
treemaps in that, it lays out hierarchical information on 
nested rectangles, and added further advantage by making it 
possible to display very large volume of hierarchical 
information by continuous clicking of node level rectangle, 
which we have demonstrated in the implementation. 




Figure 3: HierarchyMap showing nested rectangles without information 
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Figure 4: HierarchyMap representing ten different family Structures 
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Figure 5: HierarchyMap representing a combination of several hierarchical Structures. 
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V. Conclusions 

In this work, we developed and implemented a novel 
treemap called HierarchyMap algorithm, which improved on 
the limitations of the existing treemap algorithms such as 
Slice-and-dice, Cluster, Squarified, Strip, etc. and added a 
new feature, which enable viewing of several thousands of 
hierarchical information by clicking on any of the nodal 
rectangles. The result showed that the HierarchyMap 
treemap algorithm has the capability for adjustment change 
whenever data are updated; it also improved on readability, 
preservation of order, low aspect ratio, and reduced number 
of thin rectangles. The combination of these treemap metrics 
makes HierarchyMap a promising treemap algorithm for the 
future. 
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Abstract- Modern Telecommunication, Computer Networks 
and both wired and wireless communications including the 
Internet, are being designed for fast transmission of large 
amounts of data, for which Congestion Control is very 
important. Without proper Congestion control mechanism the 
congestion collapse of such networks would become highly 
complex. Congestion control for streamed media traffic over 
network is a challenge due to the sensitivity of such traffic 
towards. This challenge has motivated the researchers over the 
last decade to develop a number of congestion control protocols 
and mechanisms that suit the traffic and provides fair 
maintenance for both unicast and multicast communications. 
This paper gives out a brief survey of major congestion control 
mechanisms, categorization characteristics, elaborates the 
TCP-friendliness concept and then a state-of-the-art for the 
congestion control mechanisms designed for network. The 
paper points the pros and cons of the congestion control 
mechanism, and evaluates their characteristics. 

Keywords- TCP-Friendliness, Goals, and Metrics of 
Congestion Control and UDP Traffic 

I. Introduction 

C ongestion control over network, for all types of media 
traffic, has been an active area of research in the last 
decade [1]. This is due to the flourishing increase in the 
audiovisual traffic of digital convergence. There exists a 
variety of network applications built on its capability of 
streaming media either in real-time or on demand such as 
video streaming and conferencing, voice over IP (VoIP), 
and video on demand (VoD). The number of users for these 
network applications is continuously growing hence 
resulting in congestion. 

All the networks applications do not use TCP and therefore 
do not allow fair allocation with the available bandwidth. 
Thus, the result of the unfairness of the non-TCP 
applications did not have much impact because most of the 
traffic in the network uses TCP-based protocols. However, 
the quantity of audio/video streaming applications such as 
Internet audio and video players, video conferencing and 
analogous types of real-time applications is frequently 
increasing and it is soon expected that there will be an 
increase in the proportion of non-TCP traffic. In view of the 
fact that these applications commonly do not amalgamate 
TCP-compatible congestion control mechanisms, network 
applications treat challenging TCP-flows in an unreasonable 
manner. All TCP-flows reduce their data rates in an attempt 
to break up the congestion, where the non -TCP flows 
maintains to send at their original rate. This highly unfair 
condition will lead to starvation of TCP-traffic i.e.., 
congestion collapse [2], [3], which describes the 

disagreeable situation where the accessible bandwidth in a 
network is almost entirely occupied by packets which are 



discarded because of the congestion before they reach their 
destination. 

For this reason, it is desirable to define suitable congestion 
control mechanisms for non-TCP traffic that are compatible 
with the rate-adaptation mechanism of TCP. These 
mechanisms should make non-TCP applications TCP- 
friendly, and thus lead to a fair distribution of bandwidth. 
Unicast is a one-to-one form of communication in networks 
where multicast is one-to-many. Multicast is advantageous 
over unicast particularly in bandwidth reduction, but unicast 
is until the extensively widen communication form network. 

ii. Theory of congestion 

CONTROL SYSTEM 

Congestion control concerns in controlling the network 
traffic in a telecommunications network, to prevent the 
congestive collapse by trying to avoid the unfair allocation 
of any of the processing or capabilities of the networks and 
making the proper resource reducing steps by reducing the 
rate of packets sent. 

A. Goals and Metrics of Congestion Control 

Goals that are taken for the evaluation process of a 
congestion control algorithm are: 

i. To accomplish a high bandwidth utilization. 

ii. To congregate to fairness quickly and efficiently. 

iii. To reduce the amplitude of oscillations. 

iv. To sustain a high responsiveness. 

v. To coexist fairly and be compatible with long 
established widely used protocols. 

The Metrics [24] that have been set for Congestion control 
are: 

i. Convergence Speed - The Convergence speed 
estimates time passed to reach the equilibrium 
state. 

ii. Smoothness - The Smoothness reflects the 
magnitude of the oscillations through multiplicative 
reduction and it depends on the oscillations size. 

iii. Responsiveness - The Responsiveness is measured 
by the number of steps or the round trip times 
(RTTs) to attain equilibrium. 

The discrepancy between Responsiveness and Convergence 
Speed is that the responsiveness is related to a single flow 
and the convergence is related to the System. 

I. Efficiency - The Efficiency is the standard flow 
throughput per step or round trip time (per RTT), 
when the system is in equilibrium. 

II. Fairness: The Fairness characterizes the fair 
allocation of resources between the flows in a 
shared bottleneck link. 
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hi. Classification of congestion 

CONTROL ALGORITHMS 

The Congestion Control Algorithms are classified mainly 
based on the below criterion: 

i. Can be classified by the type and size of the 
feedback received from the network 

ii. Can be classified by increasing the deploy ability 
on the network. Only the sender needs for the 
modification (or) sender and receiver need 
modification (or) only the router needs for the 
modification (or) tall the three: sender, receiver and 
routers needs for the modification. 

iii. Can be classified by the aspect of performance. To 

make improvements in performance: high 

bandwidth networks, lossy links, fairness, 
advantage to short flows, variable-rate links 

iv. Can be classified by the fairness criterion it uses: 
max-min, proportional, "minimum potential delay" 

A. Classification of Congestion Control by Network 

Congestion control algorithms can be categorized using 
network awareness as a criterion. The following are the 
three categories for the congestion control mechanisms. 

The Black box consists of a collection of algorithms based 
on the concept that reflects on the network as a black box, 
pretentious of no knowledge of its state much other than the 
binary feedback upon congestion. 

The Grey box is grey group approaches that use the 
measurements to estimate accessible bandwidth and the 
level of contention or even the provisional characteristics of 
congestion. Because of the opportunity of wrong estimations 
and measurement dimensions, the network is considered as a 
grey box. 

The Green box contains the bimodal congestion control 
through which it can calculate explicitly the fairs hare, also 
the network-assisted control, where as the network 
communicates through its transport layer. Hence, this is 
considered as green box. 

i. The Black Box 

The black box classified congestion control is also called the 
Blind Congestion Control method and this methodology 
uses the Additive Increase Multiplicative Decrease (AIMD) 
algorithm. The AIMD implements the TCP window 
adjustments. Stability is achieved with these algorithms in 
situations where the demand of competing flows exceeds the 
available bandwidths of the channel. The congestion control 
mechanism in the conventional TCP is based on the 
fundamental idea of AIMD. In TCP-Tahoe, TCP-NewReno 
and TCP-Sack, the preservative increase phase is adopted 
exactly as in AIMD, where the protocols mechanisms are in 
the congestion control phase. In case of a packet drop, 
instead of the multiplicative reduction, a more conservative 
method is used in TCP-Tahoe. The congestion window 
resets and the protocol mechanisms enter again the slow- 
start phase. On the other hand, in TCP-NewReno and TCP- 



Sack, when the sender receives 3 DACKs, a multiplicative 
reduction is used for the both windows and slow-start 
threshold phase is applied. In such case, the protocol 
mechanism remains at the Congestion control phase. When 
the retransmission timeout expires, they enter the slow-start 
phase as in TCP-Tahoe. 

Highspeed-TCP - Highspeed-TCP modifies the response 
function in environments with high delay-bandwidth 
product, increases the congestion window more belligerently 
upon getting an acknowledgment, and reduces the window 
more gently upon a loss event. 

BIC-TCP - Binary Increase Congestion Control Protocol 
uses a hollow raise of the sources rate following each 
congestion event until the window is equivalent to that 
before the event, to maximize the utilization time of the 
network. 

CUBIC TCP - It is a less aggressive and more systematic 
derivative of BIC, where the window is a cubic function of 
time because of the final congestion event, with the 
modulation point set to the window former to the event. 
AIMD-FC - A current advancement of AIMD is Additive 
Increase Multiplicative Decrease with Fast Convergence is 
not based on a new algorithm, but on an optimization of 
AIMD and the convergence procedure that enables the 
algorithm to congregate faster and attain higher efficiency. 
Binomial Mechanisms - Binomial Mechanisms form is a 
new class for the nonlinear congestion control algorithms 
named Binomial Congestion Control Algorithms. These 
algorithms are called binomial because of the control 
mechanism that is based on the contribution of two 
additional algebraic terms with different exponents. 

SIMD Protocol - SIMD is a TCP-friendly nonlinear 
congestion control algorithm that that controls the 
congestion by utilizing history information. 

GAIMD - General AIMD Congestion Control generalizes 
congestion control mechanism of AIMD by parameter zing 
the additive increase value a and multiplicative decrease 
ratio p. 

ii. The Grey Box 

The Grey Box is also called as Measurement-based 
Congestion Control. Standard TCP relies on packet losses as 
an implicit congestion signal from congested links. There 
are a number of reasons for indicating the congestion one of 
the common reasons is the packet loss: 

Random bit corruption is the main cause for the packet loss 
and is caused when bandwidth is still available. 
Acknowledgement-based loss detection at the sender side 
can be affected by the cross-traffic on the reverse path. 
Packet loss, as a binary feedback, cannot indicate the level 
of contention before the occurrence of congestion. 

Therefore, an efficient window adjustment tactic should 
reflect various network conditions, which cannot all be 
captured simply by packet drops. Several measurement- 
based transport protocols gather information on current 
network conditions. 

TCP Vegas — The queuing delay is estimated by TCP 
Vegas. To make a constant number of packets per flow the 
window is linearly increased and decreased in the network. 
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FAST TCP — FAST achieves the same equilibrium as 
Vegas, but uses proportional control instead of linear 
increase, and intentionally scales the gain down as the 
bandwidth increases with the aim of ensuring stability. 
TCP-Westwood — A loss causes the window to be reset to 
the sender's estimation of the bandwidth-delay product in 
TCP-Westwood which is the minimum measured round trip 
times the experimental rate of getting acknowledgement. 
TFRC — TFRC is based on the rate-based congestion control 
mechanism, which intends to efficiently compete for 
bandwidth with flows in the network. 

TCP-Real — TCP-Real mechanism is based on a receiver- 
oriented and measurement-based congestion control 
mechanism that improves the overall performance of TCP 
over heterogeneous both wired or wireless networks and 
over asymmetric paths. 

TCP-Jersey — TCP-Jersey is also based on the TCP scheme 
that focuses on the competence of the transport mechanism 
in the network. 

iii. The Green Box 

The Green box contains the bimodal congestion control 
mechanism by which it can calculate explicitly the fairs hare 
of the system flow in the network. Bimodal Mechanism — 
Bimodal Congestion Avoidance and Control mechanism for 
each flow the fair-share of the total bandwidth that should 
be allocated is measured at any point during the execution of 
the system flow. 

Random Early Detection - In Random Early Detection 
(RED) packets are randomly dropped in proportion to the 
router's queue size, triggering multiplicative reducing in 
some flows. 

Explicit Congestion Notification - In Explicit Congestion 
Notification (ECN) routers are enabled to probabilistically 
mark a bit in the IP header instead of dropping the packets, 
to intimate the end-hosts of imminent congestion when the 
length of the queue exceeds a threshold [23]. 

VCP — The variable- structure congestion control protocol 
(VCP) uses two ECN (Explicit Congestion Notification) bits 
to explicitly get the feedback of the network state of 
congestion. 

iv. Congestion control algorithms 

A. Drop Tail Algorithm 

F. Postiglione et al., discussed that the drop Tail (DT) 
algorithm [15] has a great accuracy, simplest and most 
commonly used algorithm in the current networks, which 
drops packets from the tail of the full queue buffer. The 
main advantages of this algorithm are simplicity, suitability 
to heterogeneity and its decentralized nature. However, this 
algorithm also has some serious disadvantages, such as lack 
of fairness, no protection against the misbehaving or non- 
responsive flows (i.e., flows where the sending rate is not 
reduced after receiving the congestion signals from gateway 
routers) and no relative Quality of Service (QoS). QoS is of 
particular concern for the continuous transmission of high- 



bandwidth video and multimedia information [15]. This type 
of transmitting the content is difficult in the present Internet 
and network with DT. 



B. Random Early Detection Algorithm 

B. Braden et al., discussed that the Random Early Detection 
Algorithm (RED) had been proposed to be mainly used in 
the implementation of AQM (Active Queue Management) 
[4]. On the arrival of each packet, the average queue size is 
calculated by using the Exponential Weighted Moving 
Average (EWMA) [5]. The computation of the average 
queue size is compared with the minimum and the 
maximum threshold to establish the next action. 

C. Choke Algorithm 

Konstantinos Psounis et al., proposed CHOKe algorithm [6 
and 7], whenever the arrival of a new packet takes place at 
the congested gateway router, a packet is drawn at random 
from the FIFO buffer, and the drawn packet is then 
compared with the arriving packet. If both belong to the 
same flow in the network then both are dropped, else the 
packet that was chosen randomly is kept integral and the 
new incoming packet is admitted into the buffer with a 
probability depending on the level of congestion. This 
computation of the probability is the same as in RED. It is a 
simple and stateless algorithm where no special data 
structure is required. However, this algorithm is not present 
well when the number of flows is huge when compared to 
the buffer space. 

D. Blue Algorithms 

Rong Pan et al., discussed the basic idea behind the RED 
queue management system is to make early detection of the 
incipient congestion and to feed back this congestion 
notification and allowing them to decrease their sending 
rates accordingly. The RED queue length gives very less 
information about the number of contending connections in 
a shared link of the network. 

BLUE and Stochastic Fair Blue Algorithms (SFB) were 
designed to overcome the drawbacks of the problems caused 
by the RED techniques, the TCP flows are protected by 
using packet loss and link idle events against non-responsive 
flows. SFB is highly scalable and enforces fairness using an 
enormously miniature amount of state information and a 
small amount of buffer space. The FIFO queuing algorithm 
identifies and limits the non-responsive flows based on 
secretarial similar to BLUE [7]. 

E. Random Exponential Marking Algorithm 

According to Debanjan Saha the Random Exponential 
Marking Algorithm (REM) [8] is a new technique for 
congestion control, which aims to achieve a high utilization 
of link capacity, scalability, negligible loss and delay. The 
main limitations of this algorithm are: it does not give 
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incentive to cooperative sources and a properly calculated 
and fixed value of cp must be known globally. 

F. Fair Queuing Algorithms 

Alan Demers et al., proposed the Fair Queuing Algorithms 
[9] and Stochastic Fair Queuing Algorithms [10] are mainly 
used in the multimedia integrated services networks for their 
fairness and delay bounding in the flow. The frame-based 
class of FQ is called Weighted Round Robin [11], where a 
router queue scheduling method is used in which queues are 
serviced in round robin fashion in fraction to a weight 
assigned for each flow or queue. 

G. Virtual Queue Algorithm 

The Virtual Queue Algorithm (VQ) is a radical technique 
proposed by Gibben and Kelly [12]. In this scheme, a virtual 
queue is maintained in link with the same arrival rate as the 
real queue. However, the capacity of the virtual queue is 
smaller than the capacity of a real queue. When the packets 
are dropped virtual, then all packets already enqueued in the 
real queue and all new incoming packets are marked until 
the virtual queue becomes empty again. 

H. Adaptive Virtual Queue Algorithm 

R.J. Gibben et al., discussed in the Adaptive Virtual Queue 
algorithm [13] the capacity of the link and the desired 
utilization maintains a virtual queue at the link. The capacity 
and buffer size of the virtual queue is the same as that of the 
real queue. At the arrival of each packet, the virtual queue 
capacity is updated. The adaptation of virtual queue 
algorithm does not suitably follow the varying traffic pattern 
at flow in the network, and it is also FIFO based 
methodology. 

V. Tcp-friendliness 

TCP is a connection-oriented unicast protocol provides 
reliable data transfer with flow and congestion control. TCP 
maintains a congestion window, which controls the number 
of exceptional unacknowledged data packets in the network. 
The sender can send packets only as long as free slots are 
available because the data send will consume slots of the 
window. When an acknowledgment for exceptional packets 
is received, the window is shifted so that the acknowledged 
packets can leave the window and the same number of free 
slots becomes available for the upcoming data. TCP 
performs slow start, and the rate roughly doubles each 
round-trip time (RTT) to quickly increase its fair share of 
bandwidth. In its steady state, TCP uses an additive 
increase, multiplicative decrease mechanism to react to 
congestion by the detection of additional bandwidth. TCP 
increases the congestion window by one slot per round-trip 
time when there is no sign of loss. In case of packet loss is 
indicated by a timeout, the congestion window is reduced to 
one slot, and TCP reenters the slowstart phase. 



TCP-friendliness can be measured through the consequence 
of a non-TCP flow on the competing TCP flows under the 
same conditions regarding throughput and other parameters. 
A non-TCP unicast flow can be TCP-friendly if it does not 
influence the long-term throughput for any of the 
synchronized TCP flows by a factor that is more than that 
done by a TCP flow under the same conditions. A multicast 
flow is said to be TCP-friendly if it separately views for 
each sender-receiver pair of the multicast flow TCP- 
friendly. 

A. TCP-Friendliness Vs UDP Traffic 

One of the grave drawbacks of FIFO-based queue 
management is that there is no way to homogenize the 
connections which send more than their bandwidth share 
and are non -responsive or very slow in response [18] to 
congestion collapse indication. In order to present, a fair 
share of accessible bandwidth to all TCP-friendly 
connections that is amenable to the congestion collapse 
indication and the misbehaving in connections should be 
successfully synchronized by a queue management 
algorithm. One possible methodology is to solve the above 
consequences is to use per-flow queuing to discriminate 
against the non-TCP-friendly connections and to present fair 
bandwidth share to connections. It is also possible to 
provide an inducement to TCP-friendly connection in terms 
of financial benefits. Another possible method is to append a 
new concept of service i.e., differentiated services to 
connections. Thus, the differentiated services are being 
studied by the Differentiated Services Working Group in the 
IETF [17]. 

vi. Classification of congestion 

CONTROL PROTOCOLS 

Congestion control protocols are classified into four major 
categories according to a number of features in their 
mechanism of work [22]. The following shows the valid 
categories of classification. 

A. Window-Based Congestion Control 

Window-Based protocols are built based on the technique of 
congestion window-based mechanism, and the congestion 
window is used at the sender or receiver side [25]. A slot in 
that window is reserved for each packet, when the sent 
packet is acknowledged to be received the slot becomes free 
and allows transmission only when free slots are valid. In 
absence of congestion the size of window increases and 
decreases when congestion occurs in the network [14]. 

B. Rate-Based Congestion Control 

Rate-Based protocols are built based on the adaptation of 
their rate of transmission according to some incorporated 
feedback algorithm that intimates about congestion when it 
exists. Rate-based algorithms can be subdivided into simple 
mechanisms and Congestion control. The results of saw- 
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tooth throughput shape are used and this type of schemes 
usually is not fully compatible with the streaming media 
applications on which the Simple schemes are based. The 
current research tends to make the adjustment rate 
mechanisms ensuring the fairest antagonism between TCP 
and non-TCP flows equally in the network. 

C. Single-rate Congestion Control 

Single-rate congestion control mechanisms are usually 
adopted by all the unicast congestion control protocols. 
Transmission in unicast has only one recipient, so sending 
rate is adapted in accordance to the recipient’s status. 
Multicast transmission can adopt the single-rate approach 
also, where the sender streams the data with same rate to all 
recipients of the multicast group in the network. 

D. Multi-rate Congestion Control 

Multi-rate congestion control uses the layered multicast 
approach, because multi-layering enables to divide data of 
the sender into different layers to be sent to different 
multicast groups. Every receiver joins the largest possible 
number of groups permitted by the bottleneck in the way to 
sender. The quality of data to be sent to this receiver 
becomes high when joining more multicast groups. This 
feature is most evident in multicast video sessions where 
more the groups that the recipient subscribes in, is more 
layers that the recipient receives, and also more better the 
quality of video is. Meanwhile, for other mass data, the 
transfer time is decreased by additional layers [21]. By the 
usage of this mechanism, congestion control is achieved 
absolutely through the group management and routing 
mechanisms of the primary multicast protocol. 

vii. Areas of future research 

As in the case with an evolving research area, several 
unsolved issues remain. One particular problem is the lack 
of comparison congestion control protocols standard 
methods. A test background that investigates different 
important aspects such as fairness and scalability of the 
flow, combined with measures to directly compare the 
protocol performance [20] would be very handy which also 
provides standardized suite of test scenarios. While such a 
test background is not sufficient to walk around all details of 
a precise protocol, it would provide a sensible basis for more 
objective comparisons of the protocols. 

In many cases, the imitation scenarios presented for a 
protocol concentrate on a few broad- spectrum scenarios and 
are frequently too simple to capture behavior and various 
characteristics of protocol in non-standard situations. Traffic 
conditions in the network are getting too complex to be 
modeled in all the aspects by a network simulator, making it 
significant to estimate the protocols also under real-time 
applications. We already discussed the various 
characteristics and behavior of single-rate and multi rate 
congestion control. It may well be possible that different 
forms of congestion control are practical maybe with router 



support that do not show signs of the disadvantages of these 
methods. While TCP-friendliness is a practical fairness 
measure in today’s network, it is also possible that future 
network architectures will agree to or necessitate different 
definitions of fairness. Also the fairness definitions for 
multicast and many methodologies are still subject to 
research. 

We presented one possible factors and methods to overcome 
and also briefly addressed a dissimilar form where multicast 
flows are allowable to use a higher percentage of bandwidth 
than the unicast flows are, but these can be by no means the 
only promising fairness definitions. A further area of 
research is the enhancement of the models for TCP network 
traffic that are used for some of the rate based congestion 
control mechanisms. Existing TCP formulae are based on 
several assumptions that are often not met in real-time 
conditions. One feature of congestion control mechanism is, 
that is not openly related to the traffic discussed in this paper 
(i.e., streaming media traffic) but highly relevant to 
congestion control in common is how to treat the short-lived 
flows that consists of only a few data packets. The TCP 
congestion control, as well as the congestion control 
schemes presented in this paper, requires that flows 
persistence for a certain quantity of time period. If not those 
forms of congestion control are insignificant. 

VIII. Conclusion 

In this paper, we presented a survey on current trends and 
advancements in the area of TCP-friendly congestion 
control. We discussed the necessity for TCP-friendly 
congestion control for both non-TCP based unicast traffic 
and multicast communication and thus provided an 
overview of the design space for such congestion control 
mechanisms. This paper briefly surveys of various 
congestion control algorithms. It seems that at present there 
is no single algorithm that can resolve all of the problems of 
congestion control on computer networks and the Internet. 
More research work is needed in this direction. It is also to 
note that not almost all of the surveyed papers have 
employed any statistical techniques to verify their 
simulation results. The above discussed are the theory of 
congestion its goals and merits and the most common 
factors for the occurrence of congestion and the methods to 
overcome the congestion collapse. This paper in brief 
discusses the congestion control algorithms based on the 
network awareness and various common congestion control 
algorithm used and its protocols. The paper also discusses 
the TCP- friendliness and the characteristics of the TCP and 
non-TCP flows and also the discussed issues that remain to 
be solved. 
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Abstract- Document clustering is a subset of the larger field 
of data clustering, which borrows concepts from the fields of 
information retrieval (IR), natural language processing (NLP), 
and machine learning (ML), there exist a wide variety of 
unsupervised clustering algorithms. In this paper presents a 
novel algorithm for document clustering based with an 
enhancement on the features of the existing algorithms. This 
paper illustrates the Principal Direction Divisive Partitioning 
(PDDP) algorithm and describes its drawbacks and introduces 
a combinatorial framework of the PDDP algorithm and then 
describes the simplified version of the EM algorithm called the 
spherical Gaussian EM (sGEM) algorithm. The PDDP 
algorithm recursively splits the data samples into two sub - 
clusters using the hyper plane normal to the principal direction 
derived from the covariance matrix, which is the central logic 
of the algorithm. However, the PDDP algorithm can yield poor 
results, especially when clusters are not well separated from 
one another. To improve the quality of the clustering results 
problem, it is resolved by reallocating new cluster membership 
using the sGEM algorithm with different settings. 
Furthermore, based on the theoretical background of the 
sGEM algorithm, it can be obvious to extend the framework to 
cover the problem of estimating the number of clusters using 
the Bayesian Information Criterion. Experimental results are 
given to show the effectiveness of the proposed algorithm with 
comparison to the existing algorithm. 

Keywords- Introduction, Document clustering via linear 
partitioning hyper planes, The proposed Spherical Gaussian 
EM algorithm, Results and Discussions conclusion and 
future work. 

I Introduction 

C lustering has been applied to various tasks in the field of 
Information Retrieval. The Document clustering has 
become one of the most active area of research and the 
development. One of the challenging problems is document 
clustering that attempts to discover the set of meaningful 
groups of documents where those within each group are 
more closely related to one another than documents assigned 
to different groups. The resultant document clusters can 
provide a structure for organizing large bodies of text for 
efficient browsing [15]. 

Document clustering referred to as Text clustering is closely 
related to concept of data clustering. It is a more specific 



Technique for unsupervised document organization, 
automatic topic extraction and fast information retrieval or 
filtering. The process of clustering aims to discover natural 
groupings, and thus present an overview of the classes in a 
collection of documents. Clustering can either produce 
disjoint or overlapping partitions. In an overlapping 
partition, it is possible for a document to appear in multiple 
clusters. The first challenge in a clustering problem is to 
determine which features of a document are to be considered 
discriminatory. A majority of existing clustering approaches 
choose to represent each document as a vector, therefore 
reducing a document to a representation suitable for 
traditional data clustering approaches [18]. 

A wide variety of unsupervised clustering algorithms has 
been intensively studied in the document-clustering 
problem. Among the algorithms that remain the most 
common and effectual, the iterative optimization clustering 
algorithms have been demonstrated reasonable performance 
for document clustering, e.g. the Expectation Maximization 
(EM) algorithm and its variants, and the well-known K— 
means algorithm. The K-means algorithm can be considered 
as a special case of the EM algorithm, which has vast 
vicinity [3] by assuming that each cluster is modeled by a 
spherical Gaussian, each sample is assigned to a single 
cluster, and all mixing parameters are equal. The 
competitive advantage of the EM algorithm is that it is fast, 
scalable, and easy to implement. Hence, it has been chosen 
to enhance the algorithm, Expectation Maximization is 
proposed, Spherical Gaussian EM algorithm. 

Principal Direction Divisive partitioning algorithm was 
developed by Boley [1], which is a hierarchal clustering 
algorithm that performs by recursively splitting the data 
samples into two sub clusters. It applies the concept of the 
Principal Component Analysis for the requirement of the 
principal eigenvector, which is not computationally 
expensive. It can also generate a hierarchal binary tree that 
inherently produces a simple taxonomic ontology. The 
clustering results produced by the PDDP algorithm compare 
favorably to other document clustering approaches, such as 
the agglomerative hierarchal algorithm and associative rule 
hyper graph clustering. In some cases, the clusters are not 
well separated from one another, it can yield poor results. 
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The proposed methodology overcomes the disadvantages of 
the PDDP algorithm that uses the PCA for analyzing the 
data and combines it with the EM algorithm as the proposed 
work. In PDDP splits the data samples into two sub clusters 
based on the hyper plane normal to the principal direction 
derived from the covariance matrix of the data. When the 
principal direction is not representative, the corresponding 
hyper plane tends to produce individual clusters with 
wrongly partitioned contents. One practical way to deal with 
this problem is to run the EM algorithm on the partitioning 
results. A simplified version of the EM algorithm called the 
spherical Gaussian EM algorithm is presented for 
performing such task. Furthermore, based on the theoretical 
background of the spherical Gaussian EM algorithm, 
naturally extending this framework to cover the problem of 
estimating the number of clusters using the Bayesian 
Information Criterion [9]. 

The paper is organized as follows. Section 2 briefly reviews 
some important backgrounds of the PDDP algorithm, and 
addresses the problem causing the incorrect partitioning. 
Section 3 presents the proposed algorithm, spherical 
Gaussian EM algorithm. Section 4 discusses the idea of 
applying the BIC to our algorithm. Section 5 explains the 
Artificial Intelligence in EM algorithm. Section 6 explains 
the data sets and the evaluation method, and shows 
experimental results. Finally, this paper concludes in Section 
7 with some directions of future work. 




Figure 1 The Principal direction and the linear partitioning 
Hyper plane on the 2d2k dataset. 



ii Document clustering via linear 

PARTITIONING HYPER PLANES 

Considering a one-dimensional data set, e.g. real numbers 
on a line, the question is how to split this data set into two 
groups. One simple solution may be the following proce- 
dures. The mean value of the data set is first found and then 
it is compared to each point with the mean value. If the point 
value is less the mean value, it is assigned to the first group. 
Otherwise, it is assigned to the second group. The problem 
arises when it has a dimensional data set. Based on the idea 
of the PDDP algorithm, this problem can be dealt by pro- 
jecting all the data points onto the principal direction the 
principal eigenvector of the covariance matrix of the data 
set, and then the splitting process can be performed based on 
this principal direction. In geometric terms, the data points 
are partitioned into two sub clusters using the hyper plane 
normal to the principal direction passing through the mean 



vector [1]. This hyper plane is referred as the linear 
partitioning hyper plane. Figure 1 illustrates the principal 
direction and the linear partitioning hyper plane on the 2d2k 
data set, containing 1000 points distributed in 2 Gaussians. 
The PDDP algorithm begins with all the document vectors 
in a large single cluster. This procedure continues by 
recursively splitting the cluster into two sub clusters using 
the linear partitioning hyperactive plane according to the 
discriminant functions of the algorithm. This procedure 
terminates by splitting based on some heuristic, e.g. a pre 
defined number of clusters. Finally, a binary tree is yielded 
out as the output, whose leaf nodes form the resulting 
clusters. To keep this binary tree balanced, it selects an 
unsplit cluster to split by using the scatter value, measuring 
the average distance from the data points in the cluster to 
their centroid. 




Figure2 Two partitions after the first iteration. 




Figure 3 Three partitions after the second iteration 

The severe problem of the PDDP algorithm is that it cannot 
achieve good results when clusters are not well separated 
from one another. This figure 2 and 3 illustrates this 
drawback. Figure 2 shows two partitions produced by 
performing the first iteration of the PDDP algorithm on a 
dimensional data set. The data set consists of 334 points. 
The actual class labels are not given, but one can observe 
that it is composed of five compact clusters [8]. Based on 
the principal direction and the corresponding linear 
partitioning hyper plane, it can be seen that the PDDP 
algorithm starts with significantly wrong partitioning on the 
middle left hand cluster. Figure 3 shows three partitions 
after the second iteration. If the partitioning is further 
performed without making some adjustments, the resulting 
clusters become worse. This indicates that the basic PDDP 
algorithm can produce poor solutions in some distributions 
of the data, which cannot be known in advance. In addition, 
it may require some information to suggest whether to split 
the particular cluster or whether to not split on further. 
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hi The proposed spherical gaussian 

EM ALGORITHM 



It is possible to refine the partitioning results by reallocating 
new cluster membership. The basic idea of the reallocation 
method [12] is to start from some initial partitioning of the 
data set, and then proceed by moving objects from one 
cluster to another cluster to obtain an improved partitioning. 
Thus, any iterative optimization-clustering algorithm can be 
applied to do such operation. The problem is formulated as a 
finite mixture model, and applies a variant of the EM 
algorithm for learning the model. 

The most critical problem is how to estimate the model 
parameters. The data samples are assumed to be drawn from 
the multivariate normal density in Rd also assume that 
features are statistically independent, and a component cj 
generates its members from the spherical Gaussian with the 
same covariance matrix [5]. Figure 4 gives an outline of a 
simplified version of the EM algorithm. The algorithm tries 
to maximize log Lc at very step, and iterates until 
convergence. For example, the algorithm terminates when A 
log Lc < 5, where 5 is a pre defined threshold, 
begin 

Initialization: Set (z* )J 0 ' from a partitioning of the 

data, and t «- 0. 

repeat 

E-step: For each d ! s l < i < n. and cj, 1 < j < k, 

find its new component index as: 

t dHh = f E iff = argmaxj log(F :,t; (cj |cU; fy)) 

J \ 0 ? otherwise. 

M-step: Re-estimate the model parameters: 



pm**" - ;i»; 



(t+l> 



JJL 



it+1) - 



i 

W+ 1 ) 















' i=l J=1 



until AlogL a (0) < ^ ; 
end 

Figure 4 A brief SGEM Algorithm. 



A. Estimating Number Of Document Clusters 



The clustering algorithm is applied to a new data set having 
little knowledge about its contents, fixing a predefined 
number of clusters is too strict and inefficient to discover the 
latent cluster structures. The finite mixture model of EM 
algorithm covers the problem of estimating the number of 
clusters in the data set. A model selection technique is 
applied called the Bayesian Information Criterion (BIC) [9]. 
Generally, the problem of model selection is to choose the 
best one among a set of candidate models. 

The BIC contains two components, where the first term 
measures how well the parameterized model predicts the 
data, and the second term penalizes the complexity of the 
model [4]. Thus, the model selected has the largest value of 
the BIC, 



M*= argmaxiBIC (Mi). 

As a result, the value is directly obtained of the first term of 
the BIC from running the sGEM algorithm. However, it can 
also be compute it from the data according to the 
partitioning. The number of parameters is the sum of k - 1 
component probabilities, k • d centroid coordinates, and 1 
variance. 

Boley’s subsequent work [2] also suggests a dynamic 
threshold called the centroid scatter value (CSV) for 
estimating the number of clusters. This criterion is based on 
the distribution of the data. Since the PDDP algorithm is a 
kind of the divisive hierarchical clustering algorithm, it 
gradually produces a new cluster by splitting the existing 
clusters. As the PDDP algorithm proceeds, the clusters get 
smaller. Thus, the maximum scatter value in any individual 
cluster also gets smaller. The idea of the CSV is to compute 
the overall scatter value of the data by treating the collection 
of centroids as individual data vectors. This stopping test 
terminates the algorithm when the CSV exceeds the 
maximum cluster scatter value at any particular point. 

The CSV is a value that captures the overall improvement, 
whereas the BIC can be used to measure the improvement in 
both the local and global structure. As mentioned earlier, in 
the splitting process, some information is needed to make 
the decision whether to split a cluster into two sub clusters 
or keep its current structure. The BIC is first calculated 
locally when the algorithm performs the splitting test in the 
cluster. The BIC is calculated globally to measure the 
overall structure improvement. If both the local and global 
BIC scores improve, it is then split the cluster into two 
children clusters. 

iv Results and discussions 

• Data Sets And Setup Information 

The 20 Newsgroups data set consists of 20000 articles 
evenly divided among 20 different discussion groups [10]. 
This data set is collected from UseNet postings over a 
period of several months. Many categories fall into 
confusable clusters. For example, five of them are computer 
discussion groups, and three of them discuss religion. The 
Bow toolkit [11] is used to construct the term document 
matrix (sparse format). The UseNet headers are used, and 
also eliminated the stop words and low frequency words 
(occurring less than 2 times). Finally 59965x19950 term 
document matrix is obtained for this data set. 

The well-known tf-idf term weighting technique is also 
applied. Let di = (wu, wi2, wim)T, where m is the total 
number of the unique terms. The tf-idf score of each wik can 
be computed by the following formula: 

Wik = tfik • log (n/ dfk) 



Where tf^ is the term frequency of w k in d i? n is the total 
number of documents in the corpus, and df k is the number of 
documents that w k occurs. Finally, each document vector is 
normalized using the L 2 norm. For the purpose of 
comparison, the basic PDDP algorithm is chosen as the 
baseline. The number of clusters k is varied in the range [2, 
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2k], and no stopping criterion was used. Then we applied 
both the CSV and the BIC to the above settings in order to 
test the estimation of the number of clusters. 

• Evaluation Method 

Since all the documents are already categorized, comparing 
clustering results with the true class labels can perform 
evaluation. In our experiments, the normalized mutual 
information (NMI) is been used [16]. In the context of 
document clustering, mutual information can be used as a 
symmetric measure for quantifying the degree of relatedness 
between the generated clusters and the actual categories. 
Particularly, when the number of clusters differs from the 
actual number of categories, mutual information is very 
useful without a bias towards smaller clusters, by 



Data set 


Criterion 


Algorithm 


k found 


NMI 


Time (sec.) 






PDDP 


34 


0.443 


15.838 




CSV 


sGEM 


34 


0.482 


105.39 


20 Newsgroups 














PDDP 


25 


0.426 


14.70 




BIC 


sGEM 


25 


0.463 


78.45 















Table 1: Clustering results by varying stopping criteria on 
20 Newsgroups data Sets. 

Normalizing this criterion to take values between 0 and 1, 
the NMI can be calculated as follows 

Where nh is the number of documents in the category h, nl 
is the number of documents in the cluster 1, and nh,t is the 



number of documents in the category h as well as in the 
cluster 1. The NMI value is 1 when clustering results exactly 
match the true class labels, and close to 0 for a random 
partitioning [17]. 

• Experimental, Results 

Figure 5 shows the clustering results on the 20 Newsgroups 
data set. In this data set, it can be seen that the proposed 
algorithm perform relatively better than the basic PDDP 
algorithm. However, performing the global refinement after 
the local refinement as in EM degrades the quality of the 
clustering results. The global refinement with the sGEM al- 
gorithm leads to more decisions to move each document 
from its cluster to other candidate clusters. 




Figure 5: NMI results on the 20 Newsgroups data set. 



NMI 



T,hj lo g( n ' n hi i/n h ni) 

log( W«))(£ ( n t log(Tii/n)) 



Cj 


Purity 


Entropy 


H 


Ep 


Ec 


Em 


Ei 


Ef 


Emu 


Et 


Ev 


Ea 


Er 


Eo 


Emm 


Ecu 


Es 


E 


S 


P 


T 


B 


9 


1.000 


0.000 




























25 














10 


1.000 


0.000 




30 






































2 


0.998 


0.005 


488 






































1 


1 


0.978 


0.036 










3 
























132 








3 


0.900 


0.137 


1 














1 




















54 




4 


5 


0.878 


0.166 










5 


























2 


5 


86 


7 


0.865 


0.184 




4 






45 






1 


1 


1 






















8 


0.719 


0.363 




82 




4 


3 




12 


5 




1 




1 




5 




1 










11 


0.718 


0.308 




1 


1 










28 


1 




1 


7 


















6 


0.680 


0.351 




3 


6 




8 


1 




85 


21 


1 






















4 


0.425 


0.372 


1 












1 










48 














44 


19 


0 


0.216 


0.837 


4 


128 


37 


17 


54 


229 


112 


67 


31 


21 


157 


9 


14 


44 


18 


8 


9 


58 


11 


32 



Table 2 Confusion matrix generated by using sGEM and the BIC 




Global Journal of Computer Science and Technology 



Vol. 9 Issue 5 (Ver 2.0), January 2010 Page | 92 



Cj 


Purity 


Entropy 


H 


Ep 


Ec 


Em 


Ei 


Ef 


Emu 


Et 


Ev 


Ea 


Er 


Eo 


Emm 


Ecu 


Es 


E 


S 


P 


T 


B 


4 


1.000 


0.000 


122 








































7 


0.995 


0.010 


212 






































1 


3 


0.994 


0.013 


155 








































3 


0.992 


0.015 
















1 


















132 








10 


0.893 


0.139 


1 














1 




















53 




4 


1 


0.564 


0.453 




79 




3 




2 


3 


10 




1 


30 


5 


1 
















0 


0.517 


0.231 




30 


1 










26 


1 
























5 


0.517 


0.537 


1 


23 


3 


1 


1 




104 


7 


2 


4 


25 


2 


4 


12 


2 


3 


5 


1 






9 


0.507 


0.377 










3 














43 












2 


53 


104 


12 


0.435 


0.333 




3 


1 




1 


79 




1 


2 




66 






3 


1 


1 










2 


0.430 


0.312 




4 








45 




47 


1 


1 






















11 


0.474 


0.536 




7 


14 




9 


100 


3 


36 


35 


1 




1 


1 


2 


2 












6 


0.337 


0.492 




36 


1 




3 


47 




3 






65 






2 


6 












14 


0.309 


0.695 




4 


21 


7 


42 


2 


4 


25 


13 






1 


8 


2 




2 






2 


3 


13 


0.209 


0.796 


3 


57 


3 


9 


11 


3 


6 


25 




17 


2 


17 


1 


22 


2 


2 


4 


53 


5 


30 



Table 3 Confusion matrix generated by using sGEM and the CSV 

v Conclusion and future work 

This paper presents several strategies for improving the 
basic PDDP algorithm. When the principal direction is not 
representative, the corresponding hyper plane tends to 
produce individual clusters with wrongly partitioned 
contents. By formulating the problem with the finite mixture 
model. This paper describes the sGEM algorithm has 
tremendous improvement when compared to the PDDP 
algorithm in several ways for refining the partitioning 
results. Preliminarily experimental results on two different 
document sets are very encouraging. 

In future work, intends to investigate other model selection 
techniques for approximating the number of underlying 
clusters. Recently, work by [7] has demonstrated that es- 
timating the number of clusters in the kmeans algorithm 
using the Anderson Darling test yields very promising re- 
sults, and seems to outperform the BIC. The statistical 
measure can also be applied for this algorithm in further 
enhancement. 
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Abstract- The proposed method is a classification problem to 
diagnose Type II diabetes mellitus using improved Gradient 
Descent back propagation algorithm. The objective of this 
research is to increase the performance of the network in terms 
of accuracy. The accuracy was increased by using three key 
concepts: missing data replacement, data preprocessing and 
introducing the Performance Vector (PV) in the search 
direction. The results of the network have been tested using 
Pima Indian Diabetes Dataset. This experimental system 
improves the performance more than 7% than the standard 
Gradient Descent method. 
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Diabetes Mellitus, Gradient Descent, and Missing Data 
Replacement. 

I Introduction 

D iabetes mellitus is now a big growing health problem 
as it is fourth biggest cause of death worldwide 
particularly in the industrial and developing countries 
[Rajeeb Dey and Vaibhav Bajpai, Gagan Gandhi and 
Barnali Dey]. It is one of the most common chronic 
diseases, which can lead to serious long-term complications 
and death. There are two major types of diabetes; Type I and 
Type II. Type I diabetes is usually diagnosed in children and 
young adults and was previously known as Juvenile diabetes 
[Siti Farhanah, Bt Jaafar and Darmawaty Mohd Ali] . Type II 
diabetes is the most common form of diabetes. 

The design and implementation of intelligent system with 
human capabilities is the starting point to design Artificial 
Neural Networks (ANN). Artificial neural networks are 
computational systems whose architecture and operation are 
inspired from the knowledge about biological neural cells 
(neurons) in the brain [Madiha J.Jafri, Vince D. Calhoun]. 
ANNs is a network of many simple processors called units, 
linked to certain neighbors with varying coefficients of 
connectivity called weights that represent the strength of 
these connections. The basic unit of ANNs called an 
artificial neuron, simulates the basic functions of natural 
neurons. It receives inputs process them by simple 
connections and threshold operations and outputs a result. 



ANN have been successfully used to solve classification 
problems in several domains, specifically the back 
propagation algorithm is very often the favorite to train feed 
forward neural networks [T.Jayalakshmi, 
A.Santhakumaran]. Figured shows the schematic 
representation of a multilayer perceptron with eight input 
neurons, two hidden layers with eight hidden neurons and 
one output layer with single neuron. Each of the input 
neuron connects to each of the hidden neurons, and each of 
the hidden neurons connects to the output neurons. 




INPUT NODI S fflDDEN INODE S 



Figure. 1 Schematic Representation of a Multi Layer 
Perceptron 

Gradient-based methods are one of the most widely used 
error minimization methods used to train back propagation 
networks. Back propagation algorithm is a classical domain 
dependent technique for supervised training. It works by 
measuring the output error calculating the gradient of this 
error, and adjusting the ANN weights and biases in the 
descending gradient direction. Back propagation is the most 
commonly used and the simplest feed forward algorithm 
used for classification. 

This paper suggests a simple modification in the search 
direction to improve the training efficiency, by modifying 
the search direction vector. The proposed method improves 
the accuracy at the maximum to classify the Type II 
diabetes. The paper is organized as follows: Section 2 
discusses the improved gradient descent method. Section 3 
describes the experimental results and Section 4 concludes 
the paper. 
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ii Background study 

Chee-peng Lim, Jenn-Hwai Leong and Mei-Ming Kuan 
proposed a hybrid neural network comprising Fuzzy 
ARTMAP and Fuzzy C-Means clustering for pattern 
classification with incomplete training and test data. To 
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handle missing data in the training samples a three-phase 
procedure is to be proposed. FAM first trained with 
complete training samples. Training samples with missing 
features can be presented, and the missing values can be 
estimated and replaced using two FCM-based algorithms. 
Then network training is conducted using all complete and 
estimated samples. To handle test samples with missing 
features, a non-substitution FCM-based approach is 
employed to yield a predicted output quickly. Marisol 
Giardina, Yongyang Huo, Francisco Azuaje, Paul 
McCullagh, and Roy Harper makes the investigation about 
the data acquired from diabetic patients at the Ulster 
Hospital in Northern Ireland in terms of statistical 
descriptive indicators and missing values. They made a 
comparative study of several missing value estimation 
techniques. This paper reported an exploratory statistical 
analysis on Type II diabetes databases. It included a 
comparison of missing value estimation methods, which is a 
problem that has received relatively little attention from the 
medical information community. This study is part of the 
preprocessing phase in the development of supervised and 
unsupervised machine learning systems for assessing 
coronary heart disease risk in diabetic patients. HT Nguyen, 
M Butler, A Roychoudhry, AG Shannon, J Rack and P 
Mitchell proposes and develops an appropriate integrated for 
the classification of diabetic retinopathy using a multilayer 
feed forward neural network. The principal advantages of 
automated grading are quantitative accuracy and 
repeatability. Md Monirul Isalm, Md Faijul Amin, Suman 
Ahmmed and Kazuyuki Murase describes an adaptive 
merging and pruning algorithm for designing ANNs. This 
new algorithm prunes hidden neurons by merging and adds 
hidden neurons by splitting repeatedly or alternatively. The 
decision when to merge or add hidden neurons is completely 
dependent on the improvement of hidden neurons learning 
ability or the training progress of ANNs respectively. 
Aurangzeb Khan, and Kenneth Revertt describes a rough set 
theory can be utilized as tool for analyzing relatively 
complex decision tables like the Pima Indian Diabetes 
Database. They conclude that in future the missing values 
filled with 0’s can be corrected to improve the accuracy 
figure. Rajeeb Dey and Vaibhav Bajapi, Gagan Gandhi and 
Barnali Dey present a work for a classification problem 
applied to diagnosis of diabetes mellitus using back 
propagation algorithm of artificial neural network. The 
database used for training and testing the ANNs have been 
collected from Sikkim Manipal Institute of Medical 
Sciences Hospital. They propose that the effectiveness of 
data normalization in terms of network performance is 
reflected clearly in the results. Xingbo Sun, Pingxian Yang 
proposed a novel variant activation sigmoid function with 
four parameters. The improved BP algorithm based on this 
is educed and discussed. The efficiency and advantage of 
the method proved the classification results for the Chinese 
wines micrographs based on the improved and traditional 
BPNN. 

The activation function can adjust the step, position and 
mapping scope simultaneously, so it has stronger non-linear 
mapping capabilities. Michael Rimer, Tony Martinez 



presents a classification-based objective functions, an 
approach to training artificial neural networks on 
classification problems. It directly minimizes classification 
error by back propagating error only on misclassified 
patterns from culprit output nodes. Mehmet Onder Efe 
presents a comparison of neuronal activation functions 
utilized mostly in neural network applications. This paper 
dwells on the widely used neuronal activation functions as 
well as two new ones composed of sines and cosines and a 
sync function characterizing the firing of a neuron. Pasi 
Luuka study the suitability of similarity derived from Yu’s 
norms used in a similarity classifier. Usually a similarity 
classifier uses similarity based on Lukasiwich structure with 
a generalized mean. A similarity classifier has proved to be 
a good method in classifying medical data sets. He also 
tested two different preprocessing methods, PCA and 
entropy minimization, and their effects.. 

hi Methodology 

The performance of the proposed method is demonstrated 
by employing the improved gradient descent method. The 
performance criteria used in this research focus the accuracy 
of classification. This paper compares the results of normal 
gradient descent method and the improved gradient method 
in terms of accuracy. 

A. Pima Indian Diabetes Dataset 

The Pima Indian Diabetes dataset contains 768 samples with 
two-class problem. The problem posed here is to diagnose 
whether a patient would test positive or negative for 
diabetes. The diagnosis can be carried out based on personal 
data (age, number of times pregnant) and results of medical 
examination (blood pressure, body mass index, result of 
glucose tolerance test etc.) There are 500 samples of class 1 
and 268 of class 2. There are eight attributes for each 
sample. The data set is difficult to classify [Suman 
Ahmmed, Khondaker Abdullah. A Mamum and Monirul 
Islam]. 

B. Network Architecture 

The proposed method was implemented with four layer feed 
forward back propagation neural networks, i.e. one input 
layer, two hidden layers and one output layer. The 
architecture has eight input neurons, eight hidden neurons 
and one output neuron. The output neuron classifies the 
presence or absence of the diabetes. The network used for 
training is back propagation neural network. Gradient 
descent training was used to train the network, which will 
minimize the mean square error between network output and 
the actual output. During the training, the tan sigmoid 
activation function is used for hidden and output layers. The 
learning rate initialized for the network is 0.01, performance 
goal is le-08 and the number epoch is 500. Weights and 
biases are initialized to random values in the range of -1 to 
+1. The reason to initialize weights with small values is to 
prevent saturation. 
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C. Missing Data Replacement 

Neural network training could be made more efficient by 
performing certain preprocessing steps on the network 
inputs and targets. Network input processing functions 
transforms inputs into better form for the network use. The 
first key concept used in the research is the missing data 
analysis. The problem of missing data poses difficulty in the 
analysis and decision-making processes. Decision-making is 
highly depending on these data, requiring methods of 
estimation that are accurate and efficient. The data set used 
in this research contains missing values, which is a common 
one in the medical environment. The proposed method 
interprets the incomplete data into appropriate data set using 
K-nearest neighbor method. The technique K-nearest 
neighbor method replaces missing values in data with the 
corresponding value from the nearest-neighbor column. The 
nearest-neighbor column is the closest column in Euclidean 
distance. If the corresponding value from the nearest- 
neighbor column is also contains missing value the next 
nearest column is used. 

D. Data Preprocessing 

The second key concept is data preprocessing. The 
preprocessing process for the raw inputs has great effect on 
preparing the data to be suitable for the training. Without 
this preprocessing, training the neural networks would have 
been very slow. It can be used to scale the data in the same 
range of values for each input feature in order to minimize 
bias within the neural network for one feature to another. 
Data preprocessing can also speed up training time by 
starting the training process for each feature within the same 
scale. It is especially useful for modeling application where 
the inputs are generally on widely different scales. The 
proposed method preprocesses the data using Principle 
Component Analysis (PCA) method. PCA is a very popular 
preprocessing method. Principal Component’s normalization 
is based on the premise that the salient information in a 
given set of features lies in those features that have the 
largest variance. This means that for a given set of data, the 
features that exhibit the most variance are the most 
descriptive for determining differences between sets of data. 
This is accomplished by using eigenvector analysis on either 
the covariance matrix or correlation matrix for a set of data. 

E. Improved Gradient Descent Algorithm 

Gradient descent is the most widely used class of algorithm 
for supervised learning of neural networks. The most 
popular training algorithm of this category is batch back 
propagation. It is the first order method that minimizes the 
error function by updating the weights using the steepest 
descent method. 

w (t+1) = w(t) - r| AE (w(t)) 

E is the batch error measure; AE is the gradient vector which 
is computed by applying the chain rule of the layers of feed 
forward neural networks. The parameter r| is the heuristic 



called learning rate. The optimal value of r| depends on the 
shape of the error function. The improved gradient descent 
algorithm can train any network as long as its weight, net 
input, and transfer functions have derivative functions. Back 
propagation is used to calculate derivatives of performance 
with respect to the weight, bias, and Performance Vector 
(PV). Each variable is adjusted according to gradient 
descent (dX). It can be calculated as 

dX = r| * AE * PV 

Where X is the weight and bias values and dX is the search 
direction vector. PV is the Performance Vector which takes 
the values in the range of 10<PV<100 which improves the 
performance accuracy in a better manner. 

Algorithm 

i. Create an architecture consists of eight input nodes 
in the input layer, eight hidden nodes in two hidden 
layers, one output node in the output layer. Assign 
the nodes to each layer 

ii. Replace the missing data with K-nearest neighbor 
method 

iii. Preprocess the input data using PCA method 

iv. Initialize the weights and bias to random values 

v. Initialize the network parameters. 

vi. Calculate the gradient using 

dX = r| * AE * P V 

vii. Train the network with initialized parameters, and 
with sigmoid activation function. 

viii. Calculate the error using MSE method 

ix. Repeat the process until the maximum epochs are 
reached or the desired output is identified or the 
minimum gradient is reached. 

iv Experimental results 

A computer simulation has been developed to study the 
improved gradient descent method with reconstruction of 
missing values, preprocessing of data and the effectiveness 
of performance vector. The simulations have been carried 
out using MATLAB. Various networks were developed and 
tested with random initial weights. The network is trained 
five times, the performance goal is achieved at different 
epochs, and the classification accuracy is measured. The 
results of standard gradient descent and improved gradient 
descent are shown in the performance table (Table 1). The 
Gradient Descent Neural Network investigation uses a Pima 
Indian Dataset. 

To evaluate the performance of the network the entire 
sample was randomly divided into training and test sample. 
The model is tested using the standard rule of 80/20, where 
80% of the samples are used for training and 20% is used 
for testing. In this classification method, training process is 
considered successful when the MSE reaches the value le- 
08. On the other hand the training process fails to converge 
when it reaches the maximum training time before reaching 
the desired MSE. The training time of an algorithm is 
defined as the number of epochs required to meet the 
stopping criterion 
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No. Of Runs 


Standard 

GD 

(Accuracy) 


Improved 

GD 

(Accuracy) 


1 


98.0392 


100 


2 


95.4248 


100 


3 


93.4641 


99.3464 


4 


87.5817 


100 


5 


92.1569 


99.3464 


Average 

(Accuracy) 


93.33 


99.73 



Table. 1 Performance table 

V Conclusion 

This paper demonstrates the new improved gradient descent 
approach to classify the diabetic data. The improved 
gradient descent algorithm includes three key aspects such 
as replacement of missing value technique, data pre- 
processing and introducing the performance vector. The 
computational model used in this paper is to classify a type 
II diabetes using Pima Indian Dataset. This algorithm proves 
better average classification than the standard gradient 
descent method. 
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